CN116321347A

CN116321347A - Vehicle-mounted named data networking routing method based on Sarsa machine learning algorithm

Info

Publication number: CN116321347A
Application number: CN202310361915.4A
Authority: CN
Inventors: 桂易琪; 刘立
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-06-23

Abstract

The application discloses a vehicle-mounted named data networking routing method based on a Sarsa machine learning algorithm, and belongs to the field of machine learning. The method comprises the following steps: in road selection, global road information is acquired and utilized by using a fuzzy logic method and a depth-first search algorithm to obtain an optimal global routing path. When the vehicle node is selected, the vehicle keeps a Q value table with a fixed size, the Q value table has a rewarding function and is used as a basis for forwarding routes, the best node is requested to guide a global route path by searching the filtered Q value table, and information is sent to the target node. Therefore, the hit rate of the data packet in the information transmission process is improved, the end-to-end delay is reduced, and the network routing overhead is reduced.

Description

Vehicle-mounted named data networking routing method based on Sarsa machine learning algorithm

Technical Field

The application relates to the technical field of machine learning, in particular to a vehicle-mounted named data networking routing method based on a Sarsa machine learning algorithm.

Background

Vehicular ad hoc networks (VANET) are a typical application form of mobile ad hoc networks. In VANET, the vehicle and Road Side Units (RSUs), i.e. network nodes, will be equipped with on-board communication calculation and communication modules to ensure that communication between the nodes is enabled. VANET has two common communication models: vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), wherein the most representative communication mode in V2I is vehicle-to-RSU. The main design purpose of VANET is in order to improve driving safety and the timeliness and the reliability of accident early warning. Because of the characteristics of high mobility of VANET, high dynamic property of topology change, frequent disconnection of network links and the like, the traditional network system based on the IP architecture faces some challenges, and the TCP/IP network with the host as a center also has the problems of mobility management, short service life of links, safety, reliability, unavailability of network infrastructure in certain environments and the like. Thus, an Information-Centric Networking (ICN) architecture that is non-host interconnected and Information-Centric has evolved. NDN is one of the most representative future network architectures in ICN, which abandons the traditional IP-based end-to-end network transmission form, instead uses the name of data as a routing object, the core of the network changes from location to content, and the communication process changes from single host-to-host based request to data information acquisition. In the transmission process of the NDN, the data name is not changed along with the change of the node position, the performance in the aspect of adaptability is better, and the multipath routing is supported, so that the problem that the communication is interrupted due to the change of the address is relieved. The in-network caching characteristic in the NDN enables different application systems to share cached data, and the data utilization rate is improved. Meanwhile, NDN can also perform multi-network and multi-protocol cooperative work, and the performance of the NDN in the aspects of network compatibility and the like is better. The above characteristics of NDN make it more suitable for high-speed moving vehicle networks, and the architecture of VANET on NDN is a necessary trend for future research. However, although the VANET method based on NDN can support data transmission in a mobile scenario to a certain extent, there are still problems of broken communication link between vehicles and failure of data transmission and reception due to high-speed movement of vehicles, thereby affecting network communication quality and communication experience of users. Therefore, research on a data forwarding routing method between mobile nodes is particularly important.

In VANET, in view of high mobility of vehicles, frequent topology changes, node distribution unevenness, and environmental changes, machine learning is beginning to be applied as an efficient technical means in VANET routing mechanisms. Consider the Sarsa algorithm applied to VANET routing, making it more suitable for complex environments. Sarsa is a classical reinforcement learning algorithm that does not require a state transition model by estimating the state-action corresponding values. During learning, the selection of neighbor nodes is considered as a result of reinforcement learning, considering roadside units (RSUs) as fixed target nodes that periodically transmit hello packets. The node receiving the hello packet updates its own Q value according to the rule: the closer to the RSU, the greater the Q value, the better this approach works.

Disclosure of Invention

The application provides a vehicle-mounted named data networking routing method based on a Sarsa machine learning algorithm, which aims to solve the problems that network routing hit rate is not ideal, routing delay is not stable and the like in a VANET based on NDN in a traditional routing strategy.

An embodiment of a first aspect of the present application provides a method for routing a vehicle-mounted named data networking based on a Sarsa machine learning algorithm, wherein a Q value table is set inside a vehicle-mounted router, and the method includes the following steps: acquiring an optimal global routing path for transmitting information to be transmitted; calculating a feedback rewarding value of a neighbor node according to a preset rewarding function, and updating a corresponding state value in the Q value table according to the rewarding value and neighbor node information; based on a Sarsa machine learning algorithm, selecting an optimal neighbor node in the optimal global routing path as an information relay node of a next hop according to the Q value table, updating a corresponding state value in the Q value table, and continuing to select the information relay node of the next hop until the information relay node of the next hop is the target node of the information to be sent.

Optionally, in one embodiment of the present application, before acquiring the optimal global routing path for transmitting the information to be sent, the method further includes: and sending global road traffic information to a road test unit, so that the road test unit calculates the proper transmission value of each road by using fuzzy logic according to the global road traffic information, and generates an optimal global routing path of the information to be sent by using a depth-first search algorithm based on the proper transmission value of each road.

Optionally, in one embodiment of the present application, calculating the suitable transmission value of each link using fuzzy logic according to the global link traffic information includes: calculating a fuzzy factor in a fuzzy rule, and normalizing the fuzzy factor; setting a fuzzy set and a fuzzy rule, mapping the global road traffic information to a corresponding fuzzy set by using an IF-THEN rule, and outputting a fuzzy output value through the fuzzy rule; and converting the fuzzy output value by using an output relation function and a fuzzy algorithm to obtain a specific value, and taking the specific value as a proper transmission value of each road.

Optionally, in one embodiment of the present application, the global road traffic information includes at least one of a vehicle running state of a road, a vehicle distribution condition, total number of vehicles information, and vehicle speed difference information.

Optionally, in an embodiment of the present application, the feedback reward value Re of the neighboring node calculated according to the preset reward function is:

wherein sign (x) is a sign function, the output is 1 when x is greater than 0, the output is-1 when x is less than 0, r _t Received signal strength, r, for neighbor node i _h For receivingSignal strength threshold, f (n) _i ) A bonus function term for influence of neighbor node number on bonus value, n _i For the number of neighbor nodes of node i, θ is a correction factor, p (r _t ) The method is characterized in that the method comprises the steps of obtaining a reward function item representing the influence of signal strength on a reward value, wherein d is a distance formula factor, dis (i, j) is the distance between a current node i and a neighbor node j, R is a vehicle communication radius, and ζ is the number of optimal neighbor nodes of a vehicle.

Optionally, in one embodiment of the present application, updating the corresponding state value in the Q value table according to the reward value and the neighbor node information includes:

Q _i (f(i),S _i )＝Q _i (f(i),S _i )+α(R+γQ _i+1 (f(i+1),S _i+1 )-Q _i (f(i),S _i ))

wherein alpha is learning rate, S _i For the state of the vehicle node i, gamma is the attenuation factor of the reward, Q _i (f(i),S _i ) In state S according to policy for node i _i And taking action to obtain rewards.

Optionally, in one embodiment of the present application, after transmitting the information to be sent to the target node, the method further includes: and evaluating the transmission process of the information to be transmitted by using a transmission evaluation index to determine the performance of the Sarsa machine learning algorithm, wherein the transmission evaluation index comprises at least one of interest packet hit rate, average transmission delay and network routing overhead.

Optionally, in an embodiment of the present application, when there is no transmittable next-hop information relay node within a communication radius of the vehicle node, the information to be sent is sent to a drive test unit, and the drive test unit is used as the next-hop information relay node to transmit the information to be sent.

According to the vehicle-mounted named data networking routing method based on the Sarsa machine learning algorithm, in road selection, global road information is acquired and utilized by using a fuzzy logic method and a depth-first search algorithm to obtain an optimal global routing path. When the vehicle node is selected, the vehicle keeps a Q value table with a fixed size, the Q value table has a rewarding function and is used as a basis for forwarding routes, the best node is requested to guide a global route path by searching the filtered Q value table, and information is sent to the target node. Therefore, the hit rate of the data packet in the information transmission process is improved, the end-to-end delay is reduced, and the network routing overhead is reduced.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flowchart of a vehicle-mounted named data networking routing method based on a Sarsa machine learning algorithm according to an embodiment of the present application;

FIG. 2 is a graph comparing the hit rates of interest packets when the number of vehicle nodes is changed at different communication radii according to an embodiment of the present application;

FIG. 3 is a graph showing average transmission delay comparison for a number of vehicle nodes varying at different communication radii according to an embodiment of the present application;

FIG. 4 is a graph showing comparison of interest packet hit rates for varying numbers of vehicle nodes at different overall vehicle speeds according to an embodiment of the present application;

FIG. 5 is a graph showing a comparison of average transmission delays for varying numbers of vehicle nodes at different overall vehicle speeds according to an embodiment of the present application;

FIG. 6 is a graph comparing interest packet hit rates when the number of vehicle nodes is changed under different routing strategies, according to the sending rate of 1 data packet per cycle provided in the embodiments of the present application;

FIG. 7 is a graph comparing the hit rates of interest packets when the number of vehicle nodes varies under different routing strategies, according to the sending rate of 2 packets per cycle provided in the embodiments of the present application;

FIG. 8 is a graph comparing interest packet hit rates when the number of vehicle nodes is changed under different routing strategies, according to the sending rate of 4 data packets per cycle provided in the embodiments of the present application;

fig. 9 is a graph comparing average transmission delays when the number of vehicle nodes is changed under different routing strategies according to a transmission rate of 1 data packet per cycle provided in the embodiment of the present application;

fig. 10 is a graph showing average transmission delay comparison when the number of vehicle nodes is changed under different routing strategies according to a transmission rate of 2 packets per cycle provided in an embodiment of the present application;

FIG. 11 is a graph showing average transmission delay comparison when the number of vehicle nodes is changed under different routing strategies, according to the transmission rate of 4 data packets per cycle provided in the embodiment of the present application;

fig. 12 is a comparison chart of network routing overhead when the number of vehicle nodes is changed under different routing strategies according to a sending rate of 1 data packet per cycle provided in an embodiment of the present application;

fig. 13 is a graph comparing network routing overhead when the number of vehicle nodes is changed under different routing policies according to a sending rate of 2 packets per cycle provided in an embodiment of the present application;

fig. 14 is a comparison chart of network routing overhead when the number of vehicle nodes is changed under different routing policies according to a sending rate of 4 data packets per cycle provided in an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.

Fig. 1 is a flowchart of a vehicle-mounted named data networking routing method based on a Sarsa machine learning algorithm according to an embodiment of the present application.

As shown in fig. 1, the vehicle-mounted named data networking routing method based on the Sarsa machine learning algorithm comprises the following steps:

in step S101, an optimal global routing path for transmitting information to be transmitted is acquired.

In a specific embodiment, the present application obtains map information, intercepts an area of a proper range, draws corresponding road information and road topology, and fills corresponding road test units RSU and vehicle nodes after completion.

Vehicles in the on-board ad hoc network are treated as network transmission nodes with RSUs: the road side unit RSU (Road Side Unit) is a device which is arranged on the road side in the ETC system, adopts DSRC (Dedicated Short Range Communication) technology, communicates with the vehicle-mounted unit OBU (On Board Unit), and realizes vehicle identification and electronic deduction. In order to obtain the content request information more reasonably, RSU at the roadside is taken as a fixed node, a destination node periodically sends a Hello message packet, each vehicle node is taken as a movable node, and the node receiving the information updates its Q value table according to the corresponding rule.

According to the embodiment of the application, the data structure in the original router of the vehicle in the vehicle-mounted ad hoc network is modified, the Q value table is added in the vehicle-mounted router in the vehicle-mounted named data network, the RSU helps the ground vehicle to efficiently select the information relay node, network resource consumption caused by control information transmission is reduced by constructing the Q value table with a fixed size, and convergence of the Q-learning algorithm is accelerated by co-building and sharing of the Q value table in the area. In a ground network, the vehicle maintains a constant-size Q-value table whose values eventually converge to a value near an optimum value under the influence of the reward function. In the Sarsa algorithm, the reward function is constructed by comprehensively considering the factors such as the received signal strength, the data transmission distance, the information collision probability among vehicles and the like. In an actual vehicle-mounted network environment, the geographical environment change of an area governed by the RSU is not obvious, so that the state in the whole VANET environment can be reduced to two dimensions under the assistance of the RSU, namely, the state is determined by the distance between adjacent nodes and the number of neighbors of the adjacent nodes, and the vehicle can transmit the data packet to the next hop which is most suitable for transmission by inquiring a Q value table and combining with the optimal global path information of the data packet header, so that the overall information transmission efficiency of the network is improved. Thus, for any one node, the size of the Q value table stored will depend on the number of neighbor nodes and the node communication radius R, and the composition is shown in table 1.

Table 1 composition of Q values table

Composition of the composition	Description of the invention
		Node	Each vehicle and RSU in the network
State space	S
		Action set	Neighbor nodes of the illustrated node
Reward function	See reward function formula

Optionally, in one embodiment of the present application, before acquiring the optimal global routing path for transmitting the information to be sent, the method further includes: and sending global road traffic information to the road test unit, so that the road test unit calculates the proper transmission value of each road by using fuzzy logic according to the global road traffic information, and generating an optimal global routing path of the information to be sent by using a depth-first search algorithm based on the proper transmission value of each road. Wherein the global road traffic information includes at least one of a vehicle running state, a vehicle distribution condition, total number of vehicles information, and vehicle speed difference information of the road.

Before transmitting information to be transmitted, a vehicle node firstly transmits a request to a road test unit (RSU), and after the RSU receives the information transmission request from a vehicle, the RSU utilizes the proper information transmission degree value of each road, obtains an optimal global routing path by combining a depth-first search algorithm, and transmits a guide node ID (RSU_ID) passing through the path to a vehicle node needing information transmission in a data packet.

It can be appreciated that the embodiments of the present application use fuzzy logic methods and depth first search algorithms (DFS) to obtain and utilize global road information, that is, collect and analyze information such as global road traffic through the road side unit RSU, and forward the collected and analyzed information to the requesting vehicle, and the vehicle stores the information in the header of the data packet to be forwarded after receiving the corresponding information.

The main task of the RSU in this strategy is to help the vehicle that needs to transmit information acquire the optimal global path for the transmission of information. The RSU of the present invention, as a carrier for information collection and computation, is typically not involved in the routing of the vehicle nodes. When no next hop node which can be transmitted exists in the transmission range of the vehicle node, the vehicle node sends information to the RSU, and the RSU collects road vehicle information to calculate the optimal neighbor node and help the vehicle to perform corresponding data transmission.

Specifically, the vehicle node sends a detection packet including global road traffic information to the drive test unit in real time, the drive test unit obtains information such as vehicle running state, distribution condition, vehicle speed difference and total number of vehicles of the related roads by using the detection packet transmitted by the vehicle node, and deduces a degree value P (e) suitable for information transmission of each road by combining with fuzzy logic, wherein the larger the value is, the more suitable the corresponding road is for information transmission.

When the RSU receives the detection packet, extracting information in the detection packet, and calculating the average value of the number of vehicles on each road section by using the following formula:

where N is the total number of segments divided on each road, and N (k) is the total number of vehicles on the kth segment. The road section vehicle quantity distribution condition is further calculated by using the following steps:

similarly, the vehicle speed distribution condition can be calculated:

in view of the high dynamics of the vehicle, to avoid that the update depends only on the current calculated value, the update is performed using the following formula:

where λ is an influencing factor, ranging from 0 to 1.

Optionally, in one embodiment of the present application, calculating the suitable transmission value for each link using fuzzy logic based on global link traffic information includes: calculating fuzzy factors in the fuzzy rules, and normalizing the fuzzy factors; setting a fuzzy set and a fuzzy rule, mapping global road traffic information to a corresponding fuzzy set by using an IF-THEN rule, and outputting a fuzzy output value through the fuzzy rule; and converting the fuzzy output value by utilizing the output relation function and a fuzzy algorithm to obtain a specific value, and taking the specific value as a suitable transmission value of each road.

The specific flow of estimating the road suitability transmission value P (e) by using the fuzzy logic is as follows:

factors involved in the fuzzy rule are calculated and normalized, so that the setting of a subsequent fuzzy set is facilitated;

setting a fuzzy set and a fuzzy rule, mapping specific input to the corresponding fuzzy set by using an IF-THEN rule, and obtaining a fuzzy output value by using the set fuzzy rule;

and converting the fuzzy output value into a specific value by utilizing a certain output relation function and a fuzzy method, and further obtaining a suitable transmission value of each road.

In step S102, a feedback reward value of the neighbor node is calculated according to a preset reward function, and a corresponding state value in the Q value table is updated according to the reward value and neighbor node information.

After receiving the data, the vehicle node comprises the optimal path of the road, and then the optimal neighbor is obtained by utilizing the Sarsa algorithm to serve as the next hop, so that the global optimal route is determined.

In the vehicle NDN network, the vehicle maintains a constant Q table whose values eventually converge to a value near an optimum value under the influence of the bonus function. In the invention, the construction of the reward function comprehensively considers the factors such as the received signal strength, the data transmission distance, the information collision probability among vehicles and the like. By inquiring the Q value table and combining the optimal road path information of the data packet head, the vehicle can transmit the data packet to the next hop which is most suitable for transmission, and the overall information transmission efficiency of the network is improved.

Optionally, in one embodiment of the present application, the feedback reward value Re of the neighboring node calculated according to the preset reward function is:

wherein sign (x) is a sign function, and the output is when x is greater than 0When 1, x is less than 0, the output is-1, r _t Received signal strength, r, for neighbor node i _h For the received signal strength threshold, f (n _i ) A bonus function term for influence of neighbor node number on bonus value, n _i For the number of neighbor nodes of node i, θ is a correction factor, p (r _t ) The method is characterized in that the method comprises the steps of obtaining a reward function item representing the influence of signal strength on a reward value, wherein d is a distance formula factor, dis (i, j) is the distance between a current node i and a neighbor node j, R is a vehicle communication radius, and ζ is the number of optimal neighbor nodes of a vehicle.

Specifically, after receiving the data, the node selects the most suitable neighbor as the next hop according to the Sarsa machine learning algorithm and by combining with the global optimal path, and then updates the corresponding state value in the Q value table. The environmental state consists of three parts: the geographical area where the current node is located; the distance between the current node and the neighbor node; number of neighbors of a neighbor node.

The setting of the bonus function directly affects the performance of the Sarsa algorithm based routing mechanism. The node needs to continuously update the corresponding state cost function from the interaction with the environment so that it knows to which neighbor node it is optimal to transmit in the case of the current location. In the algorithm, the current node updates the local corresponding Q value according to the bonus value and the optimal estimated value of the next hop, which are attached to the information returned by the next hop selected by the current node.

The next hop j selected by the current node i will be based on the received signal strength r, the distance dis (i, j) and the number n of neighbor nodes _i In combination with the related thought of self-organizing map (SOM), the prize value Re that can be obtained by the node i is calculated as follows:

wherein sign (x) is a sign function, and when x is greater than 0, the output is 1, and when x is less than 0, the output is-1. θ is a correction factor, and p (r _t ) The calculation method of (2) is as follows:

for node i, the closer the received signal strength is to the threshold signal strength, the greater its impact on the prize value. The combination formula shows that when the received signal strength r _t Forward approach signal threshold r _h When the distance between two nodes is longer, the obtained value of the rewarding Re is larger, and conversely, the punishment to Re is larger, which is in line with the original purpose of route transmission, namely, the information is transmitted farther as much as possible while the quality of the received signal is ensured, so that the relay hops of the information transmission are reduced, and the overall time delay is reduced. The specific combined distance formula factor d is as follows:

the bonus function term f (n _i ) As shown in the above formula, in the vehicle-mounted named data network, data transmission is completed in a wireless channel, when a node has more neighbor nodes, the probability of receiving data transmitted by the neighbor nodes is larger, the data can be expanded into the whole network to cause information redundancy and confusion, and conversely, too few neighbor nodes can cause too long delay of data transmission. To control the probability of the target node receiving the data transmission, a gaussian distribution function is established, and the nodes near the number xi of the optimal neighbor nodes have larger influence on the reward value.

The neighbor state guaranteeing the transmission quality can be given corresponding rewards according to the transmission distance and the collision probability by using the rewards function parameter formula, and meanwhile, the state not meeting the transmission quality is given corresponding punishment, so that the Q value table can guide the transmission of information more effectively.

Optionally, in one embodiment of the present application, updating the corresponding state value in the Q value table according to the prize value and the neighbor node information includes:

wherein alpha is E (0, 1)]Referred to as the learning rate, is used to simulate the rate at which the Q value is updated. When alpha is 0, the node is indicated to perform action selection only according to the current state and not learn, when alpha is 1, the node is indicated to perform action selection completely depending on the future state, S _i For the state of vehicle node i, Q _i (f(i),S _i ) In state S according to policy for node i _i The reward obtained by taking action is represented as a decay factor of the reward, and the value range is 0,1, which considers that the influence of the latest action on the current value is larger than the future action. Gamma determines the importance of future rewards. When γ is set to 0, the system considers only the current reward, which behaves like a greedy algorithm, but local optimizations do not necessarily bring about global optimality. When γ is set to 1, the system strives for long-term high returns. However, future consideration cannot be estimated accurately, and good working effects cannot be ensured. Thus, in order to balance these two factors, the typical value of γ in the present invention is [0.5,0.99 ]]And the value in the experiment is 0.75.

It can be understood that the vehicle node can update the corresponding state value in the Q value table according to the feedback rewarding value attached to the information returned by the route selection next hop, the neighbor node position information, the number of neighbor nodes and the optimal Q value information.

The Q value information refers to node i's prize f (n _i ) The method is characterized in that neighbor node position information, the number of neighbor nodes and current optimal Q value information are utilized for updating, and the method is an iterative updating process, so that nodes with maximum Q value rewards are selected for different neighbor nodes to be used as suitable next-hop transmission nodes.

In step S103, based on the Sarsa machine learning algorithm, an optimal neighbor node is selected in the optimal global routing path according to the Q value table as the information relay node of the next hop, and the corresponding state value in the Q value table is updated, and the information relay node of the next hop is continuously selected until the information relay node of the next hop is the target node of the information to be sent.

When the vehicle node is selected, the vehicle keeps a Q value table with a fixed size, the Q value table has a rewarding function and is used as a basis for forwarding the route, and the best node is requested to guide the global route path by searching the filtered Q value table.

Based on the above embodiment, when there is no transmittable next-hop information relay node within the communication radius of the vehicle node, the information to be transmitted is sent to the drive test unit, and the drive test unit is used as the information relay node of the next hop to transmit the information to be transmitted.

The routing method of the present application is described below by way of algorithm execution steps of one specific embodiment.

Algorithm input:

vehicle communication range R, overlap range of communication signals of adjacent two RSUs

Algorithm output:

next hop J in the best path.

Optionally, in an embodiment of the present application, after transmitting the information to be sent to the target node, the method further includes: and evaluating the transmission process of the information to be transmitted by using a transmission evaluation index to determine the performance of the Sarsa machine learning algorithm, wherein the transmission evaluation index comprises at least one of an interest packet hit rate, an average transmission delay and network routing overhead.

In order to evaluate the performance quality under different parameter conditions, the invention considers that three different transmission evaluation indexes are obtained under the condition of different road vehicles so as to verify the performance of the proposed algorithm and system.

The effect of changing parameters in global road traffic information on the performance of the present application is described below with reference to the accompanying drawings.

As shown in fig. 2-5, the results of different communication radii of the vehicle and different overall vehicle speeds affecting the performance of the present invention are shown. Wherein, fig. 2 and fig. 3 show that the larger the communication radius of the vehicle, the more neighboring vehicles of the vehicle, the fewer hops the route forwards, and the lower the average end-to-end delay; when each vehicle node has a certain packet loss rate, the total packet loss rate is lower as the hop count of the route forwarding is smaller, and conversely, the packet arrival rate is higher. The communication radius of the vehicle is 250m when the routing strategy of the invention is compared with other routing strategies. Fig. 4 and 5 show that in case of a large number of vehicles, the vehicle speed does not have a great influence on the algorithm performance, and thus the vehicle speed is randomly distributed in 0-60km/h in the present invention.

As shown in fig. 5 to 8, the comparison results of the packet arrival rates obtained by routing the packets using different routing strategies at the transmission rates of 1, 2, and 4 packets per cycle are shown for different numbers of vehicles. With the increase of the sending rate of the source vehicle node, the packet loss number caused in the routing process is increased, so that the packet arrival rate of all routing strategies is reduced, but the routing strategy provided by the invention is added with RSU road selection assistance and a corresponding reasonable reward updating mechanism in the vehicle routing process, so that the vehicles are prevented from continuously sending data packets to the same forwarding vehicle (the vehicles are not the optimal next hop node in practice), and the packet loss probability of the vehicle routing is reduced, so that the routing strategy can show better packet arrival rate than other routing strategies under the condition of increasing the sending rate.

As shown in fig. 9 to 11, the comparison results of average end-to-end delays obtained by routing data packets using different routing strategies at the sending rates of 1, 2 and 4 data packets per cycle under the condition of different vehicle numbers are shown. As the number of vehicles in a road increases, the average end-to-end delay of all routing strategies tends to decrease overall. When the number of vehicles in the network is smaller, such as 200 and 250 vehicles, the average end-to-end delay of the ARPRL and the routing strategy of the invention is lower, the arrival rate of the data packets of the routing strategy of the invention is lower, and the data packets reach the network by adopting a forwarding mode with lower delay, so that the average end-to-end delay is lower.

As shown in fig. 12 to 14, the comparison results of the routing overhead ratios obtained by routing the packets using different routing strategies at the transmission rates of 1, 2, and 4 packets per cycle are shown for different numbers of vehicles. Experiments show that the routing overhead ratio of the routing strategy has a tendency that the overall ratio of the routing overhead ratio increases with the number of vehicles in the network and gradually becomes gentle. Compared with the ARPRL algorithm, the method is more favorable for selecting the road sections with uniform vehicles, so that the collision probability of the routing information is reduced, the data retransmission probability is reduced, the average packet transfer hop count is reduced, and the routing overhead is further influenced.

The present invention uses a joint implementation of SUMO and Spyder to verify that the present invention proposes a routing method. The environment used is a real urban road intercepted by an OSM, a SUMO is used in a map to generate a vehicle track, different speeds and random driving routes are provided for the vehicle, and a NETEDIT is used at an intersection to configure an RSU auxiliary function so as to monitor vehicle road information. The road environment is a road network with RSU auxiliary intersections of about 3000 x 3000m, and RSUs at two ends of the road can ensure complete coverage of the whole road. All vehicles have the same physical configuration. The data packets are randomly generated by a random source node. For each generated packet, a destination node is randomly selected.

Compared with the traditional geographic position-based forwarding routing algorithm and the application of greedy routing strategy in urban environment, the routing method and the routing system provided by the invention have the advantages that the feasibility and the superiority compared with the existing mechanism are verified through the design and the execution flow of the routing mechanism introduced by the embodiment, aiming at the problems that the convergence speed of the Q value table is slow, the size of the Q value table is not fixed and the like in the existing routing protocol which is applicable to the VANET network and is based on the Q-learning algorithm.

According to the vehicle-mounted named data networking routing method based on the Sarsa machine learning algorithm, routing experiment operation is conducted by simulating a network topology constructed by a real street and a running vehicle, real road environment and vehicle running data are obtained, an optimal road is obtained by adopting a routing strategy, and an optimal next-hop vehicle node is selected according to the optimal road. Compared with the traditional routing strategy and routing performance, the routing method of the invention improves the hit rate of the data packet in the information transmission process, reduces the end-to-end delay and reduces the network routing overhead.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "N" is at least two, such as two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Claims

1. The vehicle-mounted named data networking routing method based on the Sarsa machine learning algorithm is characterized by comprising the following steps of:

acquiring an optimal global routing path for transmitting information to be transmitted;

calculating a feedback rewarding value of a neighbor node according to a preset rewarding function, and updating a corresponding state value in the Q value table according to the rewarding value and neighbor node information;

based on a Sarsa machine learning algorithm, selecting an optimal neighbor node in the optimal global routing path as an information relay node of a next hop according to the Q value table, updating a corresponding state value in the Q value table, and continuing to select the information relay node of the next hop until the information relay node of the next hop is the target node of the information to be sent.

2. The method of claim 1, further comprising, prior to obtaining the optimal global routing path for transmitting the information to be transmitted:

and sending global road traffic information to a road test unit, so that the road test unit calculates the proper transmission value of each road by using fuzzy logic according to the global road traffic information, and generates an optimal global routing path of the information to be sent by using a depth-first search algorithm based on the proper transmission value of each road.

3. The method of claim 2, wherein calculating the appropriate transmission value for each link using fuzzy logic based on the global link traffic information comprises:

calculating a fuzzy factor in a fuzzy rule, and normalizing the fuzzy factor;

setting a fuzzy set and a fuzzy rule, mapping the global road traffic information to a corresponding fuzzy set by using an IF-THEN rule, and outputting a fuzzy output value through the fuzzy rule;

and converting the fuzzy output value by using an output relation function and a fuzzy algorithm to obtain a specific value, and taking the specific value as a proper transmission value of each road.

4. The method according to claim 2 or 3, wherein the global road traffic information includes at least one of a vehicle running state of a road, a vehicle distribution condition, total number of vehicles information, and vehicle speed difference information.

5. The method according to claim 1, wherein the feedback reward value Re of the neighboring node calculated from the preset reward function is:

wherein sign (x) is a sign function, the output is 1 when x is greater than 0, the output is-1 when x is less than 0, r _t Received signal strength, r, for neighbor node i _h For the received signal strength threshold, f (n _i ) A bonus function term for influence of neighbor node number on bonus value, n _i For the number of neighbor nodes of node i, θ is a correction factor, p (r _t ) The method is characterized in that the method comprises the steps of obtaining a reward function item representing the influence of signal strength on a reward value, wherein d is a distance formula factor, dis (i, j) is the distance between a current node i and a neighbor node j, R is a vehicle communication radius, and ζ is the number of optimal neighbor nodes of a vehicle.

6. The method of claim 5, wherein updating the corresponding state values in the Q value table based on the bonus value and neighbor node information comprises:

7. The method of claim 1, wherein after transmitting the information to be transmitted to the target node, the method further comprises:

and evaluating the transmission process of the information to be transmitted by using a transmission evaluation index to determine the performance of the Sarsa machine learning algorithm, wherein the transmission evaluation index comprises at least one of interest packet hit rate, average transmission delay and network routing overhead.

8. The method of claim 1, wherein the information to be sent is sent to a drive test unit when there is no transmissible next-hop information relay node within a communication radius of a vehicle node, and the drive test unit is used as the next-hop information relay node to transmit the information to be sent.