CN114332384A

CN114332384A - Vehicle-mounted high-definition map data source content distribution method and device

Info

Publication number: CN114332384A
Application number: CN202111376416.XA
Authority: CN
Inventors: 吴帆; 任炬; 张尧学
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-04-12

Abstract

The invention provides a vehicle-mounted high-definition map data source content distribution method and a device, the method utilizes a content distribution mechanism and a forwarding strategy of an NDN framework and combines a reinforcement learning method to construct an asynchronous data source selection framework, the framework is divided into an offline training part and an online selection part, the offline part is responsible for training a neural network model by using a deep reinforcement learning algorithm, the online part selects a data source by using neural network parameters synchronously obtained from the offline part, the parallel execution of data source selection, empirical track acquisition and model training is realized, and after a data source is selected, the selected data source is distributed and transmitted through a named data network NDN. By the method and the device, the problem of throughput reduction caused in the data source transmission process can be avoided, frequent data source switching is avoided, and the optimal vehicle-mounted high-definition map data source is effectively selected.

Description

Vehicle-mounted high-definition map data source content distribution method and device

Technical Field

The invention relates to the technical field of deep learning, in particular to a method and a device for distributing contents of a vehicle-mounted high-definition map data source.

Background

With the widespread deployment of information infrastructure and the rapid development of on-board sensing technology, autopilot has become a promising direction to drastically change current automotive technology. The automatic driving is a future development trend of intelligent technology, and a sensing system is formed by a large number of sensors to sense environmental information around a vehicle. According to the information of road structure, vehicle position, obstacle state and the like obtained by the sensing system, the automatic electric control system is implemented to control the running speed and direction of the vehicle so as to ensure that the vehicle can run on the road safely and reliably. Unlike traditional electronic maps, autonomous vehicles require High Definition (HD) maps to support lane level navigation. The high-definition map is a thematic map and can be divided into three layers: a road model layer, a lane model layer, and a positioning model layer. In particular, road models are used for navigation planning; the lane model is used for carrying out route planning based on perception of current road and traffic conditions; the positioning model is used for positioning the vehicle in the map, and the lane model can assist the perception of the vehicle only when the vehicle is accurately positioned on the map. The high-definition map is an essential component for realizing automatic driving, but the data volume of the high-definition map is relatively large compared with that of the traditional electronic map. Therefore, it is impractical to store all high definition maps on board the vehicle, and the road information and traffic information are changed in real time, and the high definition maps should be distributed in real time, with low delay and high reliability.

The traditional high-definition map selection and distribution method is a method for judging data sources by adopting RTT indexes, but as the number of vehicles in a coverage area increases, the traditional scheme selects the data sources by using a communication model (vehicle-to-infrastructure (V2I) or vehicle-to-vehicle (V2V)), wherein the throughput is remarkably reduced; furthermore, the conventional solutions select the data source only by measuring the Round Trip Time (RTT) between the data source and the vehicle, in which case the vehicle status is changing in real time, especially in complex moving scenarios, since other types of vehicle information (e.g. speed, direction) are not considered, the metric of RTT does not guarantee the best data source selection result; furthermore, due to mobility issues, implementing the methods of the conventional schemes may result in frequent data source switching, resulting in frequent RTT updates and inefficient data transmission. In summary, the existing solutions cannot effectively judge the quality of the currently selected data source, and cause inaccurate RTT measurement and inefficient data transmission along with the frequent movement of the vehicle.

Named data networks NDN have great potential for vehicular networks, in supporting consumer mobility, data naming, naming-based routing forwarding policies, and the like. For example, NDN naturally supports user mobility, as content-based forwarding mechanisms do not require maintenance of end-to-end connections. However, to design an efficient high-definition map distribution scheme in an in-vehicle NDN scenario, many technical challenges need to be addressed. High-precision map distribution schemes still face a number of challenges in the context of on-board NDNs. First, as the number of vehicles in the coverage area increases, current solutions select data sources by using a communication model (vehicle-to-infrastructure (V2I) or vehicle-to-vehicle (V2V)), where throughput will be significantly reduced. Second, current solutions select a data source by simply measuring the Round Trip Time (RTT) between the data source and the vehicle. In this case, the vehicle state is changing in real time, especially in complex moving scenarios, and the metric of RTT does not guarantee optimal selection results since other types of vehicle information (e.g. speed, direction) are not considered. Finally, due to mobility issues, frequent data source handovers may occur, resulting in frequent RTT updates and inefficient data transmission.

Many research works have proposed providing high-definition maps for autonomous driving, which can be divided into two categories, namely high-definition map construction and high-definition map distribution. However, these works do not consider complex scenarios with different vehicle states and communication models, and are mostly based on TCP/IP architecture. Little attention has been paid to the design of high-precision map distribution mechanisms.

Disclosure of Invention

The invention provides a method and a device for distributing contents of a vehicle-mounted high-definition map data source, and aims to select and distribute an optimal vehicle-mounted high-definition map data source for automatic driving of a vehicle.

To this end, a first object of the present invention is to provide a vehicle-mounted high definition map data source content distribution method, including:

constructing a vehicle-mounted high-definition map data source selection network, wherein the vehicle-mounted high-definition map data source selection network comprises an offline training network and an online selection network;

training the offline training network by using state information of different existing vehicle-mounted high-definition map data sources as a training data set, and applying network parameters of the offline training network to the online selection network after training is completed;

in the running process of the automatic driving vehicle, inputting the state information of a plurality of vehicle-mounted high-definition map data sources received in real time into the trained online selection network, wherein the output result is the selection result of the vehicle-mounted high-definition map data sources;

distributing the content of the selected optimal vehicle-mounted high-definition map data source to the automatic driving vehicle through a named data network;

and updating the model parameters and the weight of the vehicle-mounted high-definition map data source selection network of the automatic driving vehicle receiving the data source.

The off-line training network and the on-line selection network of the vehicle-mounted high-definition map data source selection network adopt DDQN neural networks; wherein the step of training the offline training network comprises:

collecting a vehicle-mounted high-definition map data source through a collector of the off-line training network, extracting state information, and dividing a training set and a test set to be used as training data of the off-line training network;

constructing the off-line training network; the off-line training network comprises two reinforcement learning networks DQN, and the two reinforcement learning networks DQN are synchronously trained;

inputting the training set into two reinforcement learning networks DQN for training, optimizing and updating parameters of a first reinforcement learning network DQN, finding out the action with the maximum Q value in the first reinforcement learning network DQN, and calculating the Q value meeting the requirement by using a second reinforcement learning network DQN;

and constructing a loss function of the offline training network, representing the difference between the real-time learning Q value and the target Q value, judging that the offline training network is finished when the loss function is minimum, and inputting the test set into the trained offline training network to verify the training accuracy.

After the off-line training network is trained, applying the network parameters of the off-line training network to the on-line selection network; after the on-line selection network selects the data source of the vehicle-mounted high-definition map data source, experience information is generated and sent to the off-line training network so as to enable the off-line training network to carry out training iteration.

The method comprises the following steps of receiving state information of a plurality of vehicle-mounted high-definition map data sources in real time, wherein the steps comprise:

the automatic driving vehicle sends a detection interest packet outwards to search a potential vehicle-mounted high-definition map data source and state information;

and after receiving the vehicle-mounted high-definition map data source, screening by a filter of a line selection network to obtain a vehicle-mounted high-definition map data source set which is used as the input of the line selection network.

The action formula of the first reinforcement learning network DQN for finding the maximum Q value is represented as:

a^max(s′，ω)＝argmax_a′Q(s′，a，ω) (1)

wherein s represents a state, a represents an action, and ω is a weight parameter;

using action a^max(s', ω) is calculated in the second reinforcement learning network DQN to obtain a target Q value, which is expressed by the formula:

y＝r+γQ′(s′，argmax_a′Q(s′，a，ω)，ω^-) (2)

where γ represents the discount factor and r represents the reward.

The loss function formula of the offline training network is expressed as follows:

wherein m represents the number of training times;

updating the weight of the second reinforcement learning network DQN by using a balance updating method, and performing smooth updating calculation as shown in formula (4):

ω^-←l*ω+(1-l)ω^- (4)

wherein l represents the update rate, l < 1, ω^-A weight parameter for the second reinforcement learning network DQN.

Before the vehicle-mounted high-definition map data source selection is carried out, triggering the vehicle-mounted high-definition map data source selection; the reason for judging and triggering the selection of the vehicle-mounted high-definition map data source at least comprises the following steps: selecting a data source for an autonomous vehicle when the vehicle is initialized; selecting a new data source for the autonomous vehicle when either the quality of the connected link deteriorates or the connection is broken;

judging whether the quality of a connected link is poor or the connection is disconnected, calculating the packet loss rate of the current automatic driving vehicle connection data source, taking the current timestamp as a random seed, calculating the hit switching probability by using a random function, and expressing as follows:

generating random numbers within 100 by a rand () function if

Then the probability hit is indicated to execute the switching data source; the formula is expressed as:

wherein H represents a data source switching judgment flag, P_maxAnd (4) representing the maximum packet loss rate, namely, directly switching the data source without using a probability calculation mode for switching judgment after the link packet loss rate is greater than the maximum packet loss rate.

Wherein, after receiving on-vehicle high definition map data source, in the step of screening through the filter of line selection network, include:

the selector of the line-passing selection network receives the state information s of the ith data source at the time t_t，i＝(N_t，i，M_t，i，V_t，i，D_t，i) In which N is_t，iRepresenting the round trip time RTT between the data source of the ith data source at the time t and the vehicle; m_t，iIndicating the time interval of sending data packets by the ith data source; v_t，iRepresenting the running speed of the ith data at the time t; d_t，iIndicating the distance of the ith data source from the vehicle sending the request at the time t;

N_t，irepresenting a smooth RTT value, wherein the smaller the RTT is, the smaller the round trip delay of a data source is, and the better the network performance is; when the RTT values from the same data source are obtained for multiple times, the RTT smoothing processing on the RTT values is higher in reliability, the calculation formula can be as follows through a smoothing method in a Jacobson/Karels algorithm:

N_t，i＝u*N_t-1，i+e*(R_t，i-N_t-1，i) (7)

wherein R is_t，iRepresents the currently observed instantaneous RTT value, u is 1, e is 0.125;

M_t，ithe smoothing interval time is represented, and under the condition that the bandwidth is the same, the larger the interval is, the more idle the data source is, and the larger the residual available bandwidth is; conversely, the smaller the interval, the smaller the remaining available bandwidth of the data source, and the calculation formula is:

M_t，i＝(1-σ)*M_t-1，i+σ*(Data_t，i-Data_t-1，i) (8)

where σ is 0.5, Data_t，iData representing the current state information Data transmission time_t-1，iRepresenting the sending time of the last state information data, and subtracting the two to obtain a time interval;

V_t，iindicating the vehicle speed, V_t，i> 0 indicates that the corresponding data source is driving in the same direction as the requested autonomous vehicle, V_t，i< 0 indicates that the corresponding data source is driving in opposite directions with respect to the requested autonomous vehicle; the lower the speed, the more stable the data source;

D_t，iindicating the distance of the ith data source from the autonomous vehicle sending the request at time t, D_t，i> 0 indicates that the corresponding data source is in front of the autonomous vehicle, D_t，i< 0 means correspond toThe data source is behind the autonomous vehicle; the closer the distance the data source is, the higher the stability;

sorting by using the contained different state information as a basis in sequence, and screening a data source with the best state; the 4 states can respectively select an optimal data source, and the 4 optimal data sources contain 4 groups of 16 state information data as online selection network input.

The step of screening the data source with the best state by using the contained different state information as the basis to carry out the sequencing comprises the following steps:

calculate the optimum value of each state, i.e.

The calculation is performing action a_tScore of the time selection data source:

of these, Max { G {_t，iDenotes at s_tExecuting a in the state_tThe highest score of the data source i selected during the action;

the state parameters are normalized by adjusting the network parameters of the online selection network, the value range of the action parameters is adjusted, the mapping relation between the data source and the state parameter values is constructed, and the data source with the highest score is determined to be used as the final selection result.

After the step of generating experience information and sending the experience information to an offline training network for training iteration of the offline training network, the method also comprises the step of setting rewards corresponding to data sources; the reward function is represented as:

wherein,

which is indicative of the throughput of the link,

which represents the duration of the link and is,

representing the current connection data source RTT value, passing through N_t，iCalculating a smooth RTT; the value range of other index coefficients is more than 1 and less than or equal to 2,

0＜＜φ≤0.5。

The method comprises the following steps of updating model parameters and weights of a vehicle-mounted high-definition map data source selection network of an automatic driving vehicle receiving a data source, wherein the steps comprise the following steps:

a vehicle-mounted high-definition map data source selection network of the automatic driving vehicle learns an environment sample and updates a network model and model weight through interaction with a real environment;

and updating the vehicle-mounted high-definition map data source selection network model based on a federal learning method.

Wherein, in the step of interacting with the real environment, learning the environmental sample and updating the network model and the model weights,

updating a network model and model weights by using a FederateAveraging (FedAvg) algorithm, training a vehicle-mounted high-definition map data source to select the network model by using local data and environmental states, transmitting weight results to an infrastructure RSU after multiple iterations, and calculating gradient average values of the collected weight results of all automatic driving vehicles by the infrastructure RSU, wherein the calculation formula is as follows:

wherein n is^kThe number of times of updating the weight of the automatic driving vehicle k in the local is shown, and n is the total number of times of updating the aggregation weight of the RSU end; for theThe RSU will update the weight once per day for each autonomous vehicle, especially when the autonomous vehicle is first associated to an RSU or to a new RSU.

The second purpose of the invention is to provide a vehicle-mounted high-definition map data source content distribution device, which comprises the following components:

a network construction module: the system comprises a data source selection network, a data source selection network and a data source selection network, wherein the data source selection network is used for constructing a vehicle-mounted high-definition map data source and comprises an offline training network and an online selection network;

the network training module is used for training the offline training network by using the state information of different existing vehicle-mounted high-definition map data sources as a training data set, and applying the network parameters of the offline training network to the online selection network after the training is finished;

the data source selection module is used for inputting the state information of a plurality of vehicle-mounted high-definition map data sources received in real time into the trained online selection network in the running process of the automatic driving vehicle, and the output result is the selection result of the vehicle-mounted high-definition map data sources;

the distribution module is used for distributing the content of the selected optimal vehicle-mounted high-definition map data source to the automatic driving vehicle through a named data network;

and the updating module is used for updating the model parameters and the weight of the vehicle-mounted high-definition map data source selection network of the automatic driving vehicle receiving the data source.

Different from the prior art, the vehicle-mounted high-definition map data source content distribution method provided by the invention utilizes a content distribution mechanism and a forwarding strategy of an NDN architecture and combines a reinforcement learning method to construct an asynchronous data source selection framework, the framework is divided into an offline training part and an online selection part, the offline part is responsible for training a neural network model by using a deep reinforcement learning algorithm, meanwhile, the online part selects a data source by using neural network parameters synchronously obtained from the offline part, so that the parallel execution of data source selection, empirical track acquisition and model training is realized, and after the data source is selected, the selected data source is distributed and transmitted through a named data network NDN. By the method and the device, the problem of throughput reduction caused in the data source transmission process can be avoided, frequent data source switching is avoided, and the optimal vehicle-mounted high-definition map data source is effectively selected.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flow chart of a vehicle-mounted high-definition map data source content distribution method provided by the invention.

Fig. 2 is a schematic structural diagram of a vehicle-mounted high-definition map data source selection network in the vehicle-mounted high-definition map data source content distribution method provided by the invention.

Fig. 3 is a schematic diagram of a packet structure of an interest packet and a data packet for exploration in the vehicle-mounted high-definition map data source content distribution method provided by the invention.

FIG. 4 is a logic diagram of model and weight updating in the content distribution method of the vehicle-mounted high-definition map data source provided by the invention.

Fig. 5 is a schematic structural diagram of a vehicle-mounted high-definition map data source content distribution device provided by the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The invention provides a vehicle-mounted high-definition map data source content distribution method, which adopts a deep reinforcement learning algorithm to learn and train a neural network according to collected empirical data so as to generate a selection strategy for selecting a data source. To simulate the dynamics of a vehicle scene, the vehicle speed, vehicle direction of travel, interval time, and smoothed RTT are used to represent the state information of the data source. To evaluate the performance of the actions of the selected data source, a reward function is defined that takes into account the link throughput, the link duration and the RTT. In order to operate the vehicle-mounted high-definition map selection network, an off-line training and on-line decision mechanism is proposed to find a proper data source in a vehicle-mounted scene, which means that a neural network training process is performed off-line by using collected empirical data and is updated iteratively to assist in on-line selection. In an off-line training and on-line decision mechanism, an algorithm based on asynchronous reinforcement learning is designed, track acquisition and neural network training are decoupled, and parallel execution of data source selection, empirical track acquisition and model training is realized. The method comprises the following specific steps:

fig. 1 is a schematic flow chart of a method for distributing contents of a vehicle-mounted high-definition map data source according to an embodiment of the present invention. The method comprises the following steps:

step 101, constructing a vehicle-mounted high-definition map data source selection network, wherein the vehicle-mounted high-definition map data source selection network comprises an offline training network and an online selection network.

In order to support real-time data source selection, the vehicle-mounted high-definition map selection network constructed by the invention comprises an offline training network and an online selection network, and is shown in fig. 2. The method mainly aims to decouple data acquisition and model training, so that an online algorithm can select a data source and acquire data in real time, and an offline algorithm synchronously performs model training and iteration. In the off-line part, empirical information is collected from the interaction between the selector and the environment using a collector and then stored in a replay buffer library. In addition, the trainer uses a deep Q network to train the strategy from the collected experience sets. In the online portion, the selector obtains data source information as a status by observing the environment and takes action through policy. Each time an iteration is completed, the selector shares experience with the off-line part of the collector and synchronizes the weights of the strategic neural network.

Specifically, the network model structure constructed by the invention is as follows:

agent (Agent): agents on autonomous vehicles that require map updates to perform data source selection and switching decisions. And after the selection is triggered, collecting and processing data source state information, then executing corresponding selection actions after output is obtained from the neural network, waiting for obtaining rewards, and learning by the intelligent agent in an environment interaction mode.

State(s)_t): due to the heterogeneity (RSU or vehicle) of the data source, the state information of various data sources can be acquired, and 4 parameters are adopted to represent the current state information of the data sources. Agent usage status s_t，iIndicating the state of the ith data source obtained when the data source selection is triggered at time t, i.e. s_t，i＝(N_t，i，M_t，i，V_t，i，D_t，i). Wherein N is_t，iRepresenting the RTT value of the ith data source at the time t; m_t，iIndicating the time interval of sending data packets by the ith data source; v_t，iRepresenting the running speed of the ith data at the time t; d_t，iThe distance between the ith data source and the vehicle sending the request at the time t is shown, and the specific calculation mode is developed subsequently.

Action (a)_t)：a_tRepresenting the action performed by the agent at time t. In the design of the invention, a_tInstead of selecting a particular data source directly, agents are provided with a selection method, i.e. the neural network needs to determine a particular action parameter

The action parameters are discrete sets, then specific actions are mapped to corresponding data source selection, and a specific calculation method is developed subsequently.

Reward (r)_t): when the agent has performed action a_tLater, if the data source selection is triggered again at the moment of t +1, the current data source state s is acquired at the same time_t+1The agent can calculate the prize r(s) obtained from the last action according to the state of the link_t，a_t). What the agent is to do is maximize the cumulative prize, i.e. what is desired

Referred to as a gamma discount cumulative prize,discount factor gamma e (0, 1)]The later the time, the lower the reward weight,

is a desire for all random variables. The accumulated reward can judge the quality of a certain strategy, the more excellent strategy has higher accumulated reward, and the specific calculation method and reward function definition are developed subsequently.

Strategy (π): the intelligent agent needs to continuously interact with the environment to learn a strategy pi which achieves a better effect, and the intelligent agent is guided to select the next action. The quality of the policy is judged using the accumulated award mentioned above. The present invention uses a deterministic strategy as a strategy search method, i.e., pi(s)_t)＝a_tIndicating that the agent recognized state s_tWhen it executes action a_tTherefore, the process of data source selection can be represented as a series of mapping relation pairs of data source states and actions, and then through the mapping relation pairs, the intelligent agent can select the best action in the corresponding state. The invention uses a deep reinforcement learning algorithm, does not need to establish a mapping table, and uses the action selection neural network representation strategy of figure 1. The action selection neural network can conveniently process input states and output action parameter sets.

Step 102: and training the offline training network by using state information of different existing vehicle-mounted high-definition map data sources as a training data set, and applying network parameters of the offline training network to the online selection network after training is finished.

In the online training part, the DDQN algorithm is used as a reinforcement learning method to train the Q-learning-based model so as to reduce the over-estimation phenomenon. The Q-learning core has two: hetero-policy (OffPolicy) and Temporal Difference (TD). The different strategies, namely the strategy for selecting the action and the strategy for updating the Q value are not the same strategy, the strategy for selecting the action is a greedy strategy, and the strategy for updating the Q value is a deterministic strategy, namely the action with the maximum Q value is selected. And time differentiation refers to updating the current value function with the TD objective. The TD target is the sum of the future benefits with attenuation. Firstly, in order to improve the convergence of the algorithm, two neural networks are designed for synchronous training, and the current neural network Q (weight parameter omega) is used for updating the weight parameter of the model; and using the target neural network Q' (weight parameter omega)^-) Is responsible for calculating the Q value. In addition, in order to reduce the over-estimation phenomenon caused by value iteration or parameter update (i.e. the estimated function value is larger than the true value function, which finally causes model deviation), the present invention first finds out the action with the maximum Q value in the current neural network Q, and the calculation method is as follows:

a^max(s′，ω)＝argmax_a′Q(s′，a，ω) (1)

then, using action a^max(s ', omega) is calculated in a target neural network Q', and finally a target Q value y meeting the requirement is obtained, wherein the calculation method is as follows:

y＝r+γQ′(s′，argmax_a′Q(s′，a，ω)，ω^- (2)

where γ represents the discount factor and r represents the reward.

In the off-line training part, the experience playback library stores past experience groups

Past experience is recorded for each iteration, which is crucial for model training in real-time on-board networks. For each iteration, current networks are trained by experience with randomly sampling m sets from an empirical replay library, and there is no temporal correlation between these m sets. The goal of the loss function (L) is to minimize the difference between the learned Q value and the target Q value, calculated as follows:

in addition, the invention updates the weight omega of the target network Q' by using a balance updating method^-To improve the stability of the target network Q', the smooth update calculation is as follows:

ω^-←l*ω+(1-l)ω^- (4)

wherein l represents the update rate, and l < 1.

And in the online selection part, the environmental state in the real-world vehicle scene is observed during the running process in the vehicle to take action, and the experience is collected into an accumulator to carry out offline model training. In addition, the selection strategy network structure is the same as the neural network structure of the off-line part, the environment state is input, and the output is an action value. The selector firstly obtains the weight omega trained by the current neural network Q from the trainer of the application layer when initiating a data source acquisition and switching request each time, and then synchronously gives a motion selection neural network A (the weight parameter is omega)^A). Once the data selection mechanism is triggered, the user first sends a probe interest packet to find potential data sources and state information, a filter in the selector screens the data sources and the states thereof, and eliminates the data sources which do not meet the requirements to obtain a data source set DS ═ DS1, DS2, DS3 and DS 4. For each actionAfter selecting the action value output by the strategy network A, judging whether to perform random exploration by using an e-greedy algorithm, namely, the action with the maximum probability selection value function of (1-E) and the action with the random probability selection of (1-E). The method can explore the action in the initial stage, and avoid falling into local optimization. As training and exploration are carried out, frequent exploration is not needed, and the number of exploration times is reduced by utilizing a delta factor (scaling down). As the heuristic is reduced, it will help the off-line training algorithm converge, where e- δ e. Selector in state s_tTake action a_tThereafter, the selector may receive the reward r_tAnd the next state s_t+1And apply the empirical information s_t，a_t，r_t，s_t+1End is synchronized to the experience playback library of the offline portion. In addition, the online selector also needs to perform specific data request actions and obtain corresponding rewards at the network layer. The efficiency of algorithm training can be improved by off-line training and on-line selection asynchronous algorithm design.

S103: and in the running process of the automatic driving vehicle, inputting the state information of the plurality of vehicle-mounted high-definition map data sources received in real time into the trained online selection network, wherein the output result is the selection result of the vehicle-mounted high-definition map data sources.

In the invention, the vehicle is triggered to select the data source for two reasons: 1) the vehicle must select a data source for the vehicle when initializing (by default, the RTT is the minimum of the selection criterion); 2) when the link quality of the connection deteriorates or the connection is broken. The whole data transmission process is divided into continuous time periods with equal length, and the period adopts Beacon frame interval time of IEEE802.11 protocol, namely 100ms is one period. The packet loss rate of the link is the most direct way for judging the link quality, but if the data source is switched once packet loss occurs, the data source is frequently switched and the link throughput is influenced, so the invention designs a periodic probability triggering scheme based on the packet loss rate, and aims to judge whether to switch the data source or not according to the packet loss rate, so that a vehicle can be switched to a new data source when the link quality is poor, and simultaneously, the additional overhead caused by the frequent switching of the data source is avoided. The sources of the selected data sources generally include two types: other vehicles, and a base unit RSU fixedly installed on a traffic line.

In each period, firstly calculating the packet loss rate P of the current vehicle to a connection data source, then taking the current time stamp (Timestamp) of the UNIX system as a random seed, and calculating the hit switching probability by using a random function

A random number within 100 is generated by the rand () function. If it is not

It indicates a probabilistic hit to execute the switch data source.

Wherein H represents a data source switching judgment flag, P_maxAnd (3) representing the maximum packet loss rate (default is 30%), namely, when the link packet loss rate is greater than the maximum packet loss rate, the switching judgment is not carried out in a probability calculation mode, and the data source switching is directly carried out.

If the selection mechanism is triggered, the vehicle will send probe interest packets and the selector will collect status information by receiving the packets. This means that the data source selection mechanism is activated. The selector takes the collected state as network input, the selected data source as action output, and finally corresponding rewards in the vehicle are obtained and recorded. For each cycle, the purpose of the selector is to select the best data source in the current environment. If the selection mechanism is not triggered, the selector will not select a new data source and continue to transmit current data from the data source, which means that the data source selection based method will not be activated.

Whenever a data source selection is successfully triggered, the requesting vehicle will first broadcast a probe interest package. The location of the data source is for other vehicles or infrastructure disposed along the route of transportation. During the process of sending the probe interest packet, the original link is not interrupted. However, once a new data source is selected, the original link will be disconnected and switched to the new data source. When the data source receives the probe interest packet, the data source adds additional state information (i.e., single hop tags, interval time, distance, and speed) to the returned data packet. After the probe interest packet is sent out, the requested vehicle sets a waiting timer (default is 50ms), and after the timer is overtime, the selector is triggered to select the data source. Therefore, the present invention extends some new states to collect information on the basis of the existing probe interest packets and Data packets, as shown in fig. 3. For probe interest packets, the vehicle will send probe packets with additional information, i.e. location and direction of travel. When the data source responds to the vehicle, the data source will first calculate its distance from the requesting vehicle, and additional information for the returned data packet, i.e., interval time and speed.

After receiving a vehicle-mounted high-definition map data source, the step of screening by a filter of a line selection network comprises the following steps:

N_t，i＝u*N_t-1，i+e*(R_t，i-N_t-1，i) (7)

wherein R is_t，iIndicates the currentThe observed instantaneous RTT value, u-1, e-0.125;

M_t，i＝(1-σ)*M_t-1，i+σ*(Data_t，i-Data_t-1，i) (8)

D_t，iindicating the distance of the ith data source from the autonomous vehicle sending the request at time t, D_t，i> 0 indicates that the corresponding data source is in front of the autonomous vehicle, D_t，i< 0 indicates that the corresponding data source is behind the autonomous vehicle; the closer the distance the data source is, the higher the stability;

the data source state is used as the input of the neural network, the input quantity of the neural network is required to be fixed, but the quantity of the data sources which can be detected at each time is not fixed, so that the state quantity of the data sources is not fixed. The method adopted by the invention is to screen the data source once and sort the data source by using each state as a basis in sequence. The data source with the best status can be filtered (for example, the data source DS1 RTT is minimum, and the filtering is performed), and the 4 statuses can respectively select a best data source. The 4 best data sources contain a total of 4 sets of 16 states, and the selector takes these 16 states as input. If less than 4 data sources are selected, the data sources are randomly selected as the supplement of the input. For example, if a vehicle only acquires one data source, the selector will copy 4 states of the data source 4 times as 16 states for input. The main purpose of using this approach is to fix the number of input states.

The data source selects an action design. Even if there are only two types of data sources (i.e., infrastructure RSU and vehicle), the vehicle may cause dynamic changes while acting as a data source due to the movement of the vehicle. This means that the motion space is not fixed in different vehicle scenarios. However, the number of data sources is limited by the coverage of the vehicle and RSU over the duration of the link. The invention therefore defines actions with the idea of ordering, based on the previous state definitions, we first compute the optimal value for each state, i.e. we calculate

Then executing action a by calculation_tThe scores for the hour selected data sources are as follows:

of these, Max { G {_t，iDenotes at s_tExecuting a in the state_tThe highest score of the selected data source i at the time of the action. Action score G for selecting data sources i for evaluation_tNormalizing the state parameters by adjusting the respective parameter values (

β_t，θ_t，μ_tRepresenting a corresponding state parameter, i.e. an action parameter).

The value range of the action parameter is set as

β_tE {0, 0.2, 0.8} and θ_t，μ_tE {0, 0.5}, in which case the four sets have 64 combinations as the action parameters for each state change, which means that there are 64 actions available for the agent to choose fromAnd (6) selecting. Then, a mapping relation between the data source and the action parameter value is constructed by selecting the output result of the neural network. Finally, when the vehicle is at s_tPerforming action a in State_tWhen the selected data source i is the highest score G_t，i. Selecting neural network output action parameters, and then calculating a score G for selecting data sources_t，iAnd mapping to a data source set DS { DS1, DS2, DS3, DS4}, and finally selecting a specific data source.

Designing a reward function: to ensure that the network can learn from past experience, a corresponding reward is returned each time an action is performed, representing the overall benefit of the agent following the policy. To understand the rewards of the data sources, the present invention considers the following design principles: 1) throughput is increased as much as possible. The throughput is the most basic index of map data transmission, which means that a vehicle can quickly and efficiently acquire map data; 2) extending the duration of the link. The aim is to avoid the additional overhead caused by frequent switching of data sources by vehicles, keep the link stable and increase the throughput; 3) reducing transmission delay. In an automatic driving scene, the requirement of high-definition map data distribution on transmission delay is higher. Low latency means that the data source can respond quickly to the vehicle's request, reducing packet queue time. Therefore, the present invention designs the following reward function:

wherein,

which is indicative of the throughput of the link,

which represents the duration of the link and is,

representing the current connection data source RTT value, passing through N_t，iA smoothed RTT is calculated. The value range of other index coefficients is more than 1 and less than or equal to 2,

0＜＜φ≤0.5。

The objective of the invention in designing the reward function is to maximize the anticipated cumulative discount reward, the method of calculation is the expectation

Known as gamma discounts accumulated rewards. Discount factor gamma e (0, 1)]It determines the time scale of the prize. The lower the reward weight the later the time is,

is a desire for all random variables.

And 104, distributing the content of the selected optimal vehicle-mounted high-definition map data source to the automatic driving vehicle through a named data network.

Named Data Networking (NDN) is used as a revolutionary future Internet architecture, IP is replaced by Named Data, routing is directly carried out by content names, a 'release-request-corresponding' mode is adopted for Data transmission, and point-to-multipoint efficient content distribution is achieved. An NDN routing mechanism reserves a Forwarding routing Table (FIB) similar to an IP routing, a Pending Interest Table (PIT) and a Content Store (CS) data structure are added, the FIB is used for matching a proper Forwarding interface, the CS is used for caching Content, the PIT reserves a received request packet, and when a data packet corresponding to a certain request in the PIT is sent back, the data packet is transmitted to a corresponding interface. NDN is forwarded using a longest prefix match similar to IP based on information stored in FIB and PIT. The NDN not only avoids network conflict and congestion in principle, gets rid of dependence of transmission on end-to-end connection, realizes multilink routing, but also realizes nearby acquisition and load balance based on in-network cache, thereby greatly improving performance, efficiency and reliability of large-scale content distribution.

And 105, updating model parameters and weights of the vehicle-mounted high-definition map data source selection network of the automatic driving vehicle receiving the data source.

Learning aggregation is motivated primarily by the following two key observations from highly dynamic vehicle networks. 1) Different user driving behavior implies different network topologies. For example, some users drive vehicles at different speeds and through RSU coverage areas, which means that the network dynamics level is different. A vehicle traveling in reverse causes a change in the network topology even if the user drives the vehicle at a constant speed. Thus, the network will learn to frequently alter the data source selection to match the current network topology change. 2) Although the user may select the RSU or vehicle as the data source, the vehicle status may change under different network conditions, and its content may also change frequently. In this case, the model may reduce the training cost in the initial phase when the vehicle joins the network topology. However, due to the highly dynamic vehicle network, the typically trained model is not well suited for each vehicle in its previous network topology.

Therefore, in order to balance between individual diversity and historical experience, the invention provides a learning aggregation model and a weight updating method by introducing the federal learning thought. Fig. 4 illustrates a learning aggregation and weight update method, which consists of three parts, namely personalized independent learning, weight aggregation and cold start.

The infrastructure RSU to which the present invention relates is an edge network device that makes it possible to provide model aggregation and updating for vehicles. The training characteristics of the automatic driving vehicles in selecting the data sources are the same, the main difference is that the environmental state acquired by each vehicle can be different, so the trained model has universality. If each vehicle can provide the individualized model independently learned by the vehicle to the RSU, the RSU can accelerate the convergence speed of the model training by providing the parameters of the RSU after model aggregation to the vehicle.

To reduce training time and communication costs, the present invention uses the FederateAveraging (FedAvg) algorithm to update the models and weights. Each vehicle trains the model by using local data and environmental states, the weight result is updated to the RSU after multiple iterations, and then the RSU collects the training result and calculates the gradient average value according to the weight. In this case, FedAvg only slightly increases the computational overhead of the vehicle, but it does not require uploading of local data but only model parameters can significantly reduce the communication overhead, especially if the update frequency is significantly increased. When the RSU receives the weight results of different vehicles, the invention carries out weight aggregation operation, and the updated weight and the aggregation weight omega are subjected to weighted average, and the calculation mode is as follows:

wherein n is^kThe number of times of updating the weight of the vehicle k in the local is shown, and n is the total number of times of updating the RSU-side aggregation weight. The RSU will update the weight once per day for each vehicle, especially when the vehicle is first associated to an RSU or to a new RSU.

Generally, autonomous vehicles encounter cold start problems when first model training or when entering a new environment. In this case, there is no similar experience in the experience return library shown in fig. 2 and the change of each vehicle, which would result in slow convergence or poor results. Therefore, the present invention updates the model using a federated learning based approach, where the vehicle will first request the aggregate weight ω from the RSU at the beginning of the training, and the model initialized with the weight ω is trained faster than the cold-started model. The vehicle which is not cold started can update the weight which is independently learned to the RSU when the vehicle is firstly accessed into the RSU every day, and meanwhile, the aggregation weight is downloaded to update the model. The RSU end needs to monitor the independent learning model uploaded by the vehicle end to carry out weight aggregationAnd returning the aggregated weight to the vehicle. The vehicle end needs to request the RSU for the aggregation weight omega, and the omega is trained through the T wheel^kAnd uploaded to the RSU. Macroscopically, through vehicle parallel learning, the RSU aggregates the model, so that the network model experiences enough diverse network environments from different vehicles, and the capability of coping with high dynamic networks is obtained.

In order to implement the foregoing embodiment, the present invention further provides an on-vehicle high definition map data source content distribution apparatus, as shown in fig. 5, including:

the network construction module 310 is used for constructing a vehicle-mounted high-definition map data source selection network, and the vehicle-mounted high-definition map data source selection network comprises an offline training network and an online selection network;

the network training module 320 is configured to train the offline training network by using state information of different existing vehicle-mounted high-definition map data sources as a training data set, and apply network parameters of the offline training network to the online selection network after training is completed;

the data source selection module 330 is configured to input the state information of the plurality of vehicle-mounted high-definition map data sources received in real time into the trained online selection network during the driving process of the automatic driving vehicle, where an output result is a selection result of the vehicle-mounted high-definition map data sources;

the distribution module 340 is used for distributing the content of the selected optimal vehicle-mounted high-definition map data source to the automatic driving vehicle through a named data network;

and the updating module 350 is configured to update the model parameters and the weights of the vehicle-mounted high-definition map data source selection network of the automatic driving vehicle receiving the data source.

In order to achieve the above embodiments, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the coal mine underground automatic loading and unloading coal linkage control according to the embodiments of the present invention.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A vehicle-mounted high-definition map data source content distribution method is characterized by comprising the following steps:

2. The method for distributing the contents of the vehicle-mounted high-definition map data source according to claim 1, wherein an off-line training network and an on-line selection network of the vehicle-mounted high-definition map data source selection network adopt DDQN neural networks; wherein the step of training the offline training network comprises:

3. The method for distributing the contents of the data source of the vehicle-mounted high-definition map according to claim 2, wherein after the off-line training of the training network is completed, the network parameters of the off-line training network are applied to the on-line selection network; after the on-line selection network selects the data source of the vehicle-mounted high-definition map data source, experience information is generated and sent to the off-line training network so as to enable the off-line training network to carry out training iteration.

4. The content distribution method for the vehicle-mounted high-definition map data sources according to claim 1, wherein the step of receiving the state information of the plurality of vehicle-mounted high-definition map data sources in real time comprises the following steps:

5. The content distribution method for the vehicle-mounted high-definition map data source according to claim 2, wherein the action formula of the first reinforcement learning network DQN for finding the maximum Q value is represented as:

a^max(s′,ω)＝argmax_a′Q(s′,a,ω) (1)

y＝r+γQ′(s′,argmax_a′Q(s′,a,ω),ω^-) (2)

where γ represents the discount factor and r represents the reward.

6. The method for distributing the contents of the vehicle-mounted high-definition map data source according to claim 2, wherein the loss function formula of the offline training network is represented as:

wherein m represents the number of training times;

ω^-←l*ω+(1-l)ω^- (4)

wherein l represents the update rate, l < 1, ω^-A weight parameter for a second reinforcement learning network DQN;

generating random numbers within 100 by a rand () function if

7. The method for distributing the contents of the vehicle-mounted high-definition map data source according to claim 4, wherein the step of screening by a filter of a line selection network after receiving the vehicle-mounted high-definition map data source comprises:

the selector of the line-passing selection network receives the state information s of the ith data source at the time t_t,i＝(N_t,i,M_t,i,V_t,i,D_t,i) In which N is_t,iRepresents the i thRound trip time RTT between the data source and the vehicle at the time t of the data source; m_t,iIndicating the time interval of sending data packets by the ith data source; v_t,iRepresenting the running speed of the ith data at the time t; d_t,iIndicating the distance of the ith data source from the vehicle sending the request at the time t;

N_t,irepresenting a smooth RTT value, wherein the smaller the RTT is, the smaller the round trip delay of a data source is, and the better the network performance is; when the RTT values from the same data source are obtained for multiple times, the RTT smoothing processing on the RTT values is higher in reliability, the calculation formula can be as follows through a smoothing method in a Jacobson/Karels algorithm:

N_t,i＝u*N_t-1,i+e*(R_t,i-N_t-1,i) (7)

wherein R is_t,iRepresents the currently observed instantaneous RTT value, u is 1, e is 0.125;

M_t,ithe smoothing interval time is represented, and under the condition that the bandwidth is the same, the larger the interval is, the more idle the data source is, and the larger the residual available bandwidth is; conversely, the smaller the interval, the smaller the remaining available bandwidth of the data source, and the calculation formula is:

M_t,i＝(1-σ)*M_t-1,i+σ*(Data_t,i-Data_t-1,i) (8)

where σ is 0.5, Data_t,iData representing the current state information Data transmission time_t-1,iRepresenting the sending time of the last state information data, and subtracting the two to obtain a time interval;

V_t,iindicating the vehicle speed, V_t,i>0 indicates that the corresponding data source is traveling in the same direction as the requested autonomous vehicle, V_t,i<0 indicates that the corresponding data source is traveling in opposition to the requested autonomous vehicle; the lower the speed, the more stable the data source;

D_t,iindicating the distance of the ith data source from the autonomous vehicle sending the request at time t, D_t,i>0 indicates that the corresponding data source is in front of the autonomous vehicle, D_t,i<0 indicates that the corresponding data source is behind the autonomous vehicle; the closer the distance the data source is, the higher the stability;

8. The method for distributing the contents of the vehicle-mounted high-definition map data source according to claim 7, wherein the step of screening the data source with the best status by sequentially using the contained different status information as a basis for sorting comprises the following steps:

calculate the optimum value of each state, i.e.

of these, Max { G {_t,iDenotes at s_tExecuting a in the state_tThe highest score of the data source i selected during the action;

Meanwhile, the method also comprises the step of setting the reward corresponding to the data source; the reward function is represented as:

wherein,

which is indicative of the throughput of the link,

which represents the duration of the link and is,

representing the current connection data source RTT value, passing through N_t,iCalculating a smooth RTT; the value range of other index coefficients is more than 1 and less than or equal to 2,

0＜＜φ≤0.5。

9. The method for distributing the contents of the vehicle-mounted high-definition map data source according to claim 1, wherein the step of updating the model parameters and the weights of the vehicle-mounted high-definition map data source selection network of the autonomous vehicle receiving the data source comprises the steps of:

updating a vehicle-mounted high-definition map data source selection network model based on a federal learning method;

wherein in the step of interacting with the real environment, learning the environmental samples and updating the network model and model weights,

wherein n is^kThe number of times of updating the weight of the automatic driving vehicle k in the local is shown, and n is the total number of times of updating the aggregation weight of the RSU end; the RSU will update the weight once per day for each autonomous vehicle, especially when an autonomous vehicle is first associated to an RSU or to a new RSU.

10. The vehicle-mounted high-definition map data source content distribution device is characterized by comprising the following components: