CN115002725A

CN115002725A - Unmanned aerial vehicle-assisted Internet of vehicles resource allocation method and device and electronic equipment

Info

Publication number: CN115002725A
Application number: CN202210612465.7A
Authority: CN
Inventors: 王晔; 童恩; 张剑斌; 周继革; 王红妹; 杜方芳; 刘立军
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Jiangsu Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Jiangsu Co Ltd
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2022-09-02

Abstract

The application relates to the field of Internet of vehicles resource allocation, and provides an unmanned aerial vehicle auxiliary Internet of vehicles resource allocation method, an unmanned aerial vehicle auxiliary Internet of vehicles resource allocation device and electronic equipment, wherein the method comprises the following steps: predicting the vehicle position at the next moment of the vehicle according to the detected track point data of the vehicle; receiving a task unloading request of a vehicle, and establishing a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request; the task unloading request comprises a vehicle association mode, the amount of computing resources required by the task, the amount of data required by the task and the maximum delay tolerable by the task; the vehicle association mode comprises a vehicle and unmanned aerial vehicle association mode and a vehicle and unmanned aerial vehicle non-association mode; and solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy. According to the embodiment of the application, the time delay limit and the service quality requirement of the vehicle task are met by considering the mobility of the vehicle and the time-varying property of the resource request.

Description

Unmanned aerial vehicle-assisted Internet of vehicles resource allocation method and device and electronic equipment

Technical Field

The application relates to the technical field of Internet of vehicles resource allocation, in particular to an unmanned aerial vehicle auxiliary Internet of vehicles resource allocation method, device and electronic equipment.

Background

The Internet of vehicles is a product of integration of the Internet and the Internet of things, and provides convenient and diverse services for intelligent transportation. At present, two major camps of the internet of vehicles technology are respectively a universal short-range communication (DSRC) dominated by the United states and a long term evolution (LTE-V) system of workshop communication promoted by domestic enterprises. With the rapid development of the industrial internet of things technology in the vehicle network, data exchange between vehicles, vehicles and pedestrians, and vehicles and infrastructure units is more and more frequent, and strong data processing capability is required. In the process of providing service, information of surrounding vehicles needs to be continuously processed, and the data volume is very large, so that reasonable vehicle networking resource allocation is very important for reducing interference, improving network efficiency and finally optimizing wireless communication performance.

At present, the existing vehicle resource allocation technical scheme mostly ignores the mobility of the vehicle and the time variation of the resource request, and cannot meet the time delay limitation and the service quality requirement of the vehicle task.

Disclosure of Invention

The embodiment of the application provides an unmanned aerial vehicle auxiliary vehicle networking resource allocation method, device and electronic equipment, and aims to solve the technical problems that the mobility of a vehicle and the time-varying property of a resource request are mostly ignored and the time delay limit and the service quality requirement of a vehicle task cannot be met in the existing vehicle resource allocation technical scheme.

In a first aspect, an embodiment of the present application provides an unmanned aerial vehicle-assisted internet of vehicles resource allocation method, including:

predicting the vehicle position at the next moment of the vehicle according to the detected track point data of the vehicle;

receiving a task unloading request of the vehicle, and establishing a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request; the task unloading request comprises a vehicle association mode, the amount of computing resources required by the task, the amount of data required by the task and the maximum delay tolerable by the task; the vehicle association mode comprises a vehicle and unmanned aerial vehicle association mode and a vehicle and unmanned aerial vehicle non-association mode;

and solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

In one embodiment, the predicting the vehicle position at the next time of the vehicle from the detected vehicle trajectory point data comprises:

determining a plurality of trajectory point data of the detected vehicle;

calculating the speed and the acceleration corresponding to the plurality of track point data of the vehicle;

and calculating the distance between the vehicle and the unmanned aerial vehicle and the azimuth angle of the vehicle according to the speed and the acceleration of the plurality of track point data.

In one embodiment, the establishing a resource allocation model for vehicle networking resource allocation for vehicles based on the task unloading request includes:

and establishing a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request, the vehicle available resources, the unmanned aerial vehicle available resources, the transmission rate of the vehicle and the unmanned aerial vehicle uplink and the transmission rate of the vehicle link.

In one embodiment, the solving the resource allocation model based on the reinforcement learning method and the depth certainty strategy gradient method to obtain the vehicle association optimal mode and the resource allocation strategy includes:

obtaining an optimization problem function based on the resource allocation model;

converting the optimization problem function based on a reinforcement learning method;

and solving the converted result according to a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

In one embodiment, the transforming the optimization problem function based on the reinforcement learning method includes:

converting the optimization problem function into an environment state space set, an action decision set and a reward function;

the environment state space set comprises the amount of computing resources required by the vehicle task, the amount of data required by the task, the maximum delay tolerable by the task, the vehicle position and the unmanned aerial vehicle position;

the action decision set comprises a vehicle association mode, and a computing resource proportion and a cache resource proportion which are distributed to an associated vehicle by the unmanned aerial vehicle;

the reward function is based on the maximum delay that can be tolerated by the vehicle mission and the construction of the caching resources that the vehicle has.

In one embodiment, the step of solving the converted result according to a depth deterministic strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy comprises

Initializing network parameters of a depth certainty strategy gradient method, selecting action decisions of the action decision set based on the state of the environment state space set, and executing to obtain the reward function;

and training a network of a deep deterministic strategy gradient method based on empirical data serving as a training set, and updating the network parameters to obtain a vehicle association optimal mode and a resource allocation strategy.

In a second aspect, an embodiment of the present application provides an unmanned aerial vehicle assists vehicle networking resource allocation device, include:

the vehicle position prediction module is used for predicting the vehicle position of the vehicle at the next moment according to the detected track point data of the vehicle;

the resource allocation model establishing module is used for receiving a task unloading request of the vehicle and establishing a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request; the task unloading request comprises a vehicle association mode, the amount of computing resources required by the task, the amount of data required by the task and the maximum delay tolerable by the task; the vehicle association mode comprises a vehicle and unmanned aerial vehicle association mode and a vehicle and unmanned aerial vehicle non-association mode;

and the solving module is used for solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory storing a computer program, where the processor implements the steps of the method for allocating resources in an unmanned aerial vehicle-assisted internet of vehicles according to the first aspect when executing the program.

In a fourth aspect, an embodiment of the present application provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for allocating network resources in unmanned aerial vehicle assisted by a drone according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps of the method for allocating resources in an unmanned aerial vehicle-assisted internet network according to the first aspect are implemented.

According to the unmanned aerial vehicle auxiliary vehicle networking resource allocation method (invention name) provided by the embodiment of the application, the vehicle position of the vehicle at the next moment is predicted according to the detected track point data of the vehicle, so that the mobility of the vehicle is considered, and the interaction of communication data between the vehicles is carried out in time; the method comprises the steps of receiving a task unloading request of a vehicle, and establishing a resource allocation model for allocating the resources of the vehicle network for the vehicle based on the task unloading request; and solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy. Therefore, the embodiment of the application considers the time-varying property of the resources, and enables the limited resources to be reasonably and dynamically allocated to the vehicles requesting the resources, so that the time delay limit and the service quality requirement of vehicle tasks are met, the performance of vehicle-mounted communication is improved, the optimization of a series of vehicle continuous motion spaces is stable, and the convergence is high.

Drawings

In order to more clearly illustrate the technical solutions in the present application or prior art, the drawings used in the embodiments or the description of the prior art are briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for allocating resources in an auxiliary vehicle networking of an unmanned aerial vehicle according to an embodiment of the present application;

fig. 2 is an auxiliary vehicle networking scenario of an unmanned aerial vehicle provided by an embodiment of the present application;

fig. 3 is a second schematic flowchart of a method for allocating resources of an auxiliary vehicle networking of an unmanned aerial vehicle according to an embodiment of the present application;

fig. 4 is a third schematic flowchart of a method for allocating resources of an auxiliary vehicle networking of an unmanned aerial vehicle according to an embodiment of the present application;

FIG. 5 is a deep reinforcement learning model based on a deep deterministic strategy gradient method provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of an algorithm flow of a gradient method based on a depth certainty strategy provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of an unmanned aerial vehicle auxiliary internet of vehicles resource allocation device provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 shows a method for allocating resources of an unmanned aerial vehicle-assisted internet of vehicles. Referring to fig. 1, an embodiment of the present application provides a method for allocating resources of an auxiliary vehicle networking of an unmanned aerial vehicle, which may include:

step 100, predicting the vehicle position at the next moment of the vehicle according to the detected track point data of the vehicle;

the electronic device predicts a vehicle position at a next time of the vehicle based on the detected trajectory point data of the vehicle. Wherein, the electronic device may be a drone. In the embodiment of the present application, please refer to fig. 2, and fig. 2 shows an auxiliary internet of vehicles scenario for an unmanned aerial vehicle according to the embodiment of the present application. The unmanned aerial vehicle assists the vehicle networking scene by N vehicle in the two-way road of straight line with deploy in aerial rotor M unmanned aerial vehicle and constitute, carry out effectual resource allocation in the vehicle networking to the task that maximize vehicle and unmanned aerial vehicle successfully accomplished. The task that the vehicle and the drone successfully complete means that the vehicle is associated with the drone, and the vehicle unloads the task to an MEC (Multi-access Edge Computing) server of the drone for execution.

In one embodiment, referring to fig. 3, the step 100 of predicting the vehicle position at the next time of the vehicle according to the detected vehicle trajectory point data includes:

step 110, determining a plurality of detected track point data of the vehicle;

the electronic device determines a plurality of trajectory point data for the detected vehicle. Specifically, in practice, a plurality of track point data of vehicle can be perceived through the radar device of a plurality of different unmanned aerial vehicles. For example, when the number of the unmanned aerial vehicles is S, the S unmanned aerial vehicles can sense three track points of the vehicle through the radar device. If a certain vehicle is in the coverage area of S unmanned aerial vehicles, the three trace point sets sensed by the S unmanned aerial vehicles are respectively:

a set a: { (x) _n,1， y _n,1 ),(x _n,2 ，y _n,2 ),...,(x _n,S ，y _n,S )}；

And b, collection: { (x) _(n-1),1， y _(n-1),1 ),(x _(n-1),2， y _(n-1),2 ),...,(x _(n-1),S， y _(n-1),S )}；

And c, set c: { (x) _(n-2),1， y _(n-2),1 ),(x _(n-2),2， y _(n-2),2 ),...,(x _(n-2),S， y _(n-2),S )}；

In the application embodiment, three fusion track points of the vehicle are obtained by weighting and averaging three track point data, (x) _n ,y _n )，(x _n-1 ,y _n-1 )，(x _n-2 ,y _n-2 ) The formula of averaging by weight is:

wherein (x) _n ,y _n )，(x _n-1 ,y _n-1 )，(x _n-2 ,y _{n_2} ) A plurality of detected trajectory point data of the vehicle may be determined in the embodiment of the present application.

Step 120, calculating the corresponding speed and acceleration of the plurality of track point data of the vehicle;

and the electronic equipment calculates the speed and the acceleration of the vehicle corresponding to the plurality of track point data. Specifically, in the embodiment of the present application, calculating the speed and the acceleration corresponding to a plurality of the track point data of the vehicle is represented by the following formula:

where v denotes the velocity, a denotes the acceleration, and Δ T denotes the time interval from this time to the next time.

And 130, calculating the distance between the vehicle and the unmanned aerial vehicle and the azimuth angle of the vehicle according to the speed and the acceleration of the plurality of track point data.

And the electronic equipment calculates the distance between the vehicle and the unmanned aerial vehicle and the azimuth angle of the vehicle according to the speed and the acceleration of the plurality of track point data. Specifically, it is considered that the acceleration of the vehicle does not change much from the (n-2) th time slot to the nth time slot, i.e., a _x,n ≈a _{x,n_1} ≈a _x,n-2 And predicting the position coordinate parameters of the vehicle at the next moment according to the state information of the vehicle corresponding to the three track point data, including the azimuth angle of the vehicle. At x, the epaxial distance of y, and then calculate the vehicle and correspond the distance between the unmanned aerial vehicle according to known unmanned aerial vehicle fixed flight height H and the pythagorean theorem, its formula is:

y _n+1|n ＝3y _n -3y _n-1 +y _{n_2} ≈3y _n -3y _n-1 +y _n-2 ，

wherein, the first and the second end of the pipe are connected with each other,

represents the distance component of the vehicle on the x-axis from the nth slot to the (n +1) th slot,

represents the distance component of the vehicle on the y axis from the nth time slot to the (n +1) th time slot, H represents the fixed flying height of the unmanned plane,

indicating the straight-line distance of the vehicle from the nth slot to the (n +1) th slot to the drone,

indicating the azimuth of the (n +1) th slot vehicle.

The embodiment utilizes the unmanned aerial vehicle radar device to detect the track point data of vehicle, utilizes the prediction formula to predict the next position of vehicle based on track point data, realizes perception vehicle position, in time carries out communication data's between the vehicle interaction.

200, receiving a task unloading request of the vehicle, and establishing a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request; the task unloading request comprises a vehicle association mode, the amount of computing resources required by the task, the amount of data required by the task and the maximum delay tolerable by the task; the vehicle association mode comprises a vehicle and unmanned aerial vehicle association mode and a vehicle and unmanned aerial vehicle non-association mode;

after the unmanned aerial vehicle predicts the position of the vehicle at the next moment, the vehicle can randomly generate different calculation tasks and send a task unloading request to the unmanned aerial vehicle according to the requirement. The task unload request includes a vehicle association schema that considers b _i Being binary variables, b _i (t)＝{0,1}，b _i 1 denotes a vehicleAssociating with the unmanned aerial vehicle, and unloading the task to an MEC server of the unmanned aerial vehicle for execution; b _i 0 means that the vehicle is not associated with a drone, performing its own computational task. For task off-loading requests sent by vehicle i at time t

Is shown in which

The amount of computing resources required by the task, the amount of data required by the task, and the maximum delay that can be tolerated by the task, respectively.

And the electronic equipment receives the task unloading request of the vehicle and establishes a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request.

In an embodiment, the step 200 of establishing a resource allocation model for vehicle networking resource allocation for a vehicle based on the task offloading request specifically includes:

and the electronic equipment establishes a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request, the available resources of the vehicle, the available resources of the unmanned aerial vehicle, the transmission rate of the uplink of the vehicle and the unmanned aerial vehicle and the transmission rate of the link of the vehicle.

The available resources of the vehicle comprise computing resources and cache resources of the vehicle, and the computing resources and cache resources of the vehicle executing the task by the vehicle

And (4) showing. Where i represents the serial number of the vehicle. The available resources for the drone include available computing and caching resources for the drone. Available computing and caching resources of the drone are used separately

Indicate where j indicates the serial number of the drone. The unmanned plane allocates the calculation resource and the buffer resource to the associated vehicles according to the allocation proportion

Represents; the condition that the unmanned aerial vehicle and the vehicle successfully complete the task is that the cache resource of the unmanned aerial vehicle is more than or equal to the data amount required by the task, namely

For the vehicle i ∈ N (T), the total time from generating the task to receiving the processing result is T _i (t) is expressed as:

wherein T is _i (t) resource allocation model for vehicle networking resource allocation, e _j,i (t) is the transmission rate of the vehicle and drone uplink, e _i (t) is the transmission rate of the vehicle's own link.

According to the embodiment of the application, the association mode of the vehicle and the unmanned aerial vehicle is determined after the vehicle task is generated, the dynamic resource allocation proportion of the Internet of vehicles is reasonably adjusted, and a resource management model is established so as to maximize the number of tasks successfully completed by the vehicle and the unmanned aerial vehicle, so that the time delay limit and the service quality requirement of the vehicle task are met.

And step 300, solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

And the electronic equipment solves the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

Specifically, in an embodiment, referring to fig. 4, the solving the resource allocation model based on the reinforcement learning method and the deep deterministic policy gradient method to obtain the vehicle-associated optimal mode and the resource allocation policy includes:

step 310, obtaining an optimization problem function based on the resource allocation model;

after the electronic device establishes the resource allocation model, an optimization problem function F is obtained, which is expressed as the following formula:

where b (t) is a vehicle correlation pattern matrix, f ^co (t) is an allocation matrix of unmanned aerial vehicle computing resources, f ^ca And (t) is an unmanned aerial vehicle cache resource allocation matrix, H (·) is a step function, when the variable is greater than or equal to 0, the value is 1, otherwise, the value is 0. That is, for the vehicle i which is allocated with enough buffer resources and meets the task delay requirement, there are

Or alternatively

The constraint condition is to maximally utilize computing resources and cache resources.

Step 320, converting the optimization problem function based on a reinforcement learning method;

and the electronic equipment converts the optimization problem function based on a reinforcement learning method. Since the optimization problem function F is a non-convex function and has high complexity, the embodiment of the present application converts the optimization problem function F by using a reinforcement learning method.

Specifically, in an embodiment, the step 320 of converting the optimization problem function based on the reinforcement learning method includes:

and converting the optimization problem function into an environment state space set, an action decision set and a reward function. The environment state space set comprises the amount of computing resources required by the vehicle task, the amount of data required by the task, the maximum delay tolerable by the task, the vehicle position and the unmanned aerial vehicle position; the action decision set comprises a vehicle association mode, and a computing resource proportion and a cache resource proportion which are distributed to an associated vehicle by the unmanned aerial vehicle; the reward function is based on the maximum delay that can be tolerated by the vehicle mission and the construction of the caching resources that the vehicle has.

Wherein the set of environmental state spaces S is represented as the following set:

x’ ₁ (t),x’ ₂ (t),...,x’ _M (t),y’ ₁ (t),y’ ₂ (t),...,y’ _M (t),z’ ₁ (t),z’ ₂ (t),...,z’ _M (t)}；

let the number of vehicles associated with unmanned aerial vehicle be N' _j (j ∈ {1,2, …, M }), defining an action space a, selecting a vehicle association mode by the drone at time t according to a current policy pi, and defining an action decision set as a (t) as the proportion of computing resources and cache resources allocated to the associated vehicle by the drone, namely:

after performing the action decision a (t) in the set s (t) of environmental state spaces, a reward is returned to the drone, defined as a reward function R, expressed as:

the reward function directs the drone to perform policy updates, where two reward elements are defined, denoted as:

wherein

The amount of computing resources required by the task, the amount of data required by the task, and the maximum delay that can be tolerated by the task, respectively. For computing resources and cache resources possessed by vehicles performing tasks by themselves

And (4) showing.

And step 330, solving the converted result according to a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

And the electronic equipment solves the converted result according to a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

The electronic device solves the transformed optimization problem function by using a depth certainty strategy gradient method, and a depth reinforcement learning model based on the depth certainty strategy gradient method is shown in fig. 5.

According to the method and the device, the unmanned aerial vehicle (electronic equipment) serves as an intelligent agent, actions are selected and executed based on the current state, the reward function is obtained, and the optimal strategy is selected through feedback updating. Obtaining an evaluation function Q according to the determined S, A and R, and expressing the evaluation function Q as

Where E represents expectation, γ is a discount factor of r (t), r (t) represents the instant prize returned to the drone at time t, defined as the average prize for the vehicle, and τ < 1.

In one embodiment, step 330, the method of solving the converted result according to the depth deterministic strategy gradient method to obtain the optimal mode of vehicle association and the resource allocation strategy includes

Step 331, initializing network parameters of a deep deterministic policy gradient method, selecting and executing an action decision of the action decision set based on the state of the environment state space set, and obtaining the reward function;

and 332, training a network of the deep deterministic strategy gradient method based on empirical data serving as a training set, and updating the network parameters to obtain a vehicle associated optimal mode and a resource allocation strategy.

Specifically, the DDPG method (i.e., the depth deterministic policy gradient method) includes an Actor network and a Critic network, where the Actor network is used to generate a current policy, and the Critic network is used to judge whether the policy is good or bad in the current state, and an algorithm flow diagram of the depth deterministic policy gradient method according to the embodiment of the present application is shown in fig. 6.

In order to improve the stability of training, a Target-Actor network and a Target-critical network are introduced, and step 330 of the embodiment of the present application includes the following specific steps:

1) initializing Actor network pi and Critic network Q and network parameter theta ^π And theta ^Q ；

2) Initializing a Target-Actor network pi 'and a Target-Critic network Q' and a network parameter theta ^π′ And theta ^Q′

3) Initializing successful experience cache pool R _success And a failure experience cache pool R _failure 。

4) For each round (epicode), the following steps are cycled:

(1) selecting an initial state s ₁ ；

(2) For each step in the round (step), the following steps are cycled:

according to the current input state s _t And Actor network performs output action a _t Receive the instant prize r _t And the next state s _t+1 Further empirical data(s) are obtained _t ,a _t ,r _t ,S _t+1 )；

Determining whether the learning of the wheel is terminated, if not, using the experience data(s) _t ,a _t ,r _t ,s _t+1 ) Store to successful experience cache pool R _success If not, executing the third step;

③ using the experience data(s) _t ,a _t ,r _t ,s _t+1 ) Store to failure experience cache pool R _failure From R to _success In which N is taken out _failure An experience also puts in R _failure Performing the following steps;

fourthly, randomly sampling m experience data(s) from the two experience pools _i ,a _i ,r _i ,S _i+1 )，i≤m；

Calculating the expected return of the current action through a Target-critical network:

y _i ＝r _i +γQ′(s _i+1 ,π′,θ ^Q′ )

defining a Critic network minimum loss function to update network parameters:

seventhly, updating the network parameters of the Actor through the following gradients:

updating Target-Actor network and Target-critical network parameters by the following equation:

(3) the step (step) loop is ended.

5) The round (epicode) loop is ended.

After training set training is finished, an optimization objective function is solved, the optimal strategy of the vehicle association mode and the resource distribution proportion is obtained, the purpose of meeting vehicle time delay limitation and service quality requirements is achieved, and therefore performance of vehicle communication is improved.

According to the method and the device, the reinforcement learning method and the depth certainty strategy gradient (DDPG) method are used for converting and solving the optimization problem function, the vehicle association mode and resource allocation combined decision in the vehicle networking is effectively carried out, the delay constraint and task service quality requirements of the vehicle are met, the performance of vehicle-mounted communication is improved, the stability is shown in the optimization of a series of vehicle continuous action spaces, and the convergence is high.

Predicting the vehicle position at the next moment of the vehicle according to the track point data of the detected vehicle so as to consider the mobility of the vehicle and perform interaction of communication data between the vehicles; the method comprises the steps of receiving a task unloading request of a vehicle, and establishing a resource allocation model for allocating the resources of the vehicle network for the vehicle based on the task unloading request; and solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy. Therefore, the embodiment of the application considers the time-varying property of the resources, and enables the limited resources to be reasonably and dynamically allocated to the vehicles requesting the resources, so that the time delay limit and the service quality requirement of vehicle tasks are met, the performance of vehicle-mounted communication is improved, the optimization of a series of vehicle continuous motion spaces is stable, and the convergence is high.

The unmanned aerial vehicle auxiliary vehicle networking resource allocation device provided by the embodiment of the application is described below, and the unmanned aerial vehicle auxiliary vehicle networking resource allocation device described below and the unmanned aerial vehicle auxiliary vehicle networking resource allocation method described above can be referred to correspondingly.

Referring to fig. 7, an embodiment of the present application provides an unmanned aerial vehicle assisted internet of vehicles resource allocation device, including:

a vehicle position prediction module 201, configured to predict a vehicle position at a next moment of the vehicle according to the detected trajectory point data of the vehicle;

a resource allocation model establishing module 202, configured to receive a task unloading request of the vehicle, and establish a resource allocation model for allocating resources of the vehicle in the internet of vehicles based on the task unloading request; the task unloading request comprises a vehicle association mode, the amount of computing resources required by the task, the amount of data required by the task and the maximum delay tolerable by the task; the vehicle association mode comprises a vehicle and unmanned aerial vehicle association mode and a vehicle and unmanned aerial vehicle non-association mode;

and the solving module 203 is used for solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

According to the unmanned aerial vehicle auxiliary internet of vehicles resource allocation device, the vehicle position of the vehicle at the next moment is predicted according to the detected track point data of the vehicle, so that the mobility of the vehicle is considered, and communication data interaction between the vehicles is carried out in time; the method comprises the steps of receiving a task unloading request of a vehicle, and establishing a resource allocation model for allocating the resources of the Internet of vehicles based on the task unloading request; and solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy. Therefore, the embodiment of the application considers the time-varying property of the resources, and enables the limited resources to be reasonably and dynamically allocated to the vehicles requesting the resources, so that the time delay limit and the service quality requirement of vehicle tasks are met, the performance of vehicle-mounted communication is improved, the optimization of a series of vehicle continuous motion spaces is stable, and the convergence is high.

In one embodiment, the vehicle position prediction module comprises:

a trajectory point data determination module for determining a plurality of detected trajectory point data of the vehicle;

the speed and acceleration calculation module is used for calculating the speed and acceleration corresponding to the plurality of track point data of the vehicle;

and the position prediction module is used for calculating the distance between the vehicle and the unmanned aerial vehicle and the azimuth angle of the vehicle according to the speeds and the accelerations of the plurality of track point data.

In one embodiment, the resource allocation model building module is specifically configured to build a resource allocation model for allocating the vehicle-networking resources to the vehicle based on the task unloading request, the vehicle available resources, the drone available resources, the transmission rate of the vehicle and drone uplink, and the transmission rate of the vehicle own link.

In one embodiment, the solving module comprises:

an optimization problem function obtaining module, configured to obtain an optimization problem function based on the resource allocation model;

the conversion module is used for converting the optimization problem function based on a reinforcement learning method;

and the final solving module is used for solving the converted result according to a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

In one embodiment, the conversion module is specifically configured to convert the optimization problem function into an environment state space set, an action decision set, and a reward function;

In one embodiment, the final solution module is configured to:

and training a network of the deep deterministic strategy gradient method based on empirical data as a training set, and updating the network parameters to obtain a vehicle associated optimal mode and a resource allocation strategy.

Fig. 8 illustrates a physical structure diagram of an electronic device, and as shown in fig. 8, the electronic device may include: a processor (processor)810, a Communication Interface 820, a memory 830 and a Communication bus 840, wherein the processor 810, the Communication Interface 820 and the memory 830 communicate with each other via the Communication bus 840. The processor 810 may invoke the computer program in the memory 830 to perform the steps of the drone assisted vehicle networking resource deployment method, including, for example: predicting the vehicle position at the next moment of the vehicle according to the detected track point data of the vehicle; receiving a task unloading request of the vehicle, and establishing a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request; the task unloading request comprises a vehicle association mode, the amount of computing resources required by the task, the amount of data required by the task and the maximum delay tolerable by the task; the vehicle association mode comprises a vehicle and unmanned aerial vehicle association mode and a vehicle and unmanned aerial vehicle non-association mode; and solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present application further provides a computer program product, where the computer program product includes a computer program, where the computer program is stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, a computer is capable of executing the steps of the drone assisted vehicle networking resource allocation method provided in the foregoing embodiments, for example, including: predicting the vehicle position at the next moment of the vehicle according to the detected track point data of the vehicle; receiving a task unloading request of the vehicle, and establishing a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request; the task unloading request comprises a vehicle association mode, the amount of computing resources required by the task, the amount of data required by the task and the maximum delay tolerable by the task; the vehicle association mode comprises a vehicle and unmanned aerial vehicle association mode and a vehicle and unmanned aerial vehicle non-association mode; and solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

On the other hand, embodiments of the present application further provide a processor-readable storage medium, where the processor-readable storage medium stores a computer program, where the computer program is configured to cause a processor to perform the steps of the method provided in each of the above embodiments, for example, including: predicting the vehicle position at the next moment of the vehicle according to the detected track point data of the vehicle; receiving a task unloading request of the vehicle, and establishing a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request; the task unloading request comprises a vehicle association mode, the amount of computing resources required by the task, the amount of data required by the task and the maximum delay tolerable by the task; the vehicle association mode comprises a vehicle and unmanned aerial vehicle association mode and a vehicle and unmanned aerial vehicle non-association mode; and solving the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle association optimal mode and a resource allocation strategy.

The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An unmanned aerial vehicle-assisted Internet of vehicles resource allocation method is characterized by comprising the following steps:

2. The unmanned aerial vehicle-assisted internet of vehicles resource allocation method of claim 1, wherein the predicting the vehicle position at the next moment of the vehicle according to the detected vehicle trajectory point data comprises:

determining a plurality of detected trajectory point data for the vehicle;

calculating the corresponding speed and acceleration of the plurality of track point data of the vehicle;

3. The unmanned aerial vehicle-assisted internet of vehicles resource allocation method of claim 1, wherein the establishing a resource allocation model for internet of vehicles resource allocation for vehicles based on the task offloading request comprises:

and establishing a resource allocation model for allocating the vehicle networking resources for the vehicle based on the task unloading request, the available resources of the vehicle, the available resources of the unmanned aerial vehicle, the transmission rate of the uplink of the vehicle and the unmanned aerial vehicle and the transmission rate of the link of the vehicle.

4. The unmanned aerial vehicle-assisted internet of vehicles resource allocation method of claim 1, wherein the solving of the resource allocation model based on a reinforcement learning method and a depth certainty strategy gradient method to obtain a vehicle-associated optimal mode and a resource allocation strategy comprises:

5. The UAV-assisted vehicle networking resource deployment method of claim 4, wherein transforming the optimization problem function based on a reinforcement learning method comprises:

6. The unmanned aerial vehicle-assisted Internet of vehicles resource allocation method of claim 5, wherein the method of solving the converted result according to the depth deterministic strategy gradient method to obtain the optimal mode of vehicle association and the resource allocation strategy comprises

7. The utility model provides an unmanned aerial vehicle assists car networking resource allotment device which characterized in that includes:

the vehicle position prediction module is used for predicting the vehicle position at the next moment of the vehicle according to the track point data of the detected vehicle;

8. An electronic device comprising a processor and a memory storing a computer program, wherein the processor, when executing the computer program, implements the steps of the drone assisted vehicle networking resource allocation method of any one of claims 1 to 6.

9. A non-transitory computer readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the drone assisted vehicle networking resource deployment method of any one of claims 1 to 6.

10. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the steps of the drone assisted vehicle networking resource deployment method of any one of claims 1 to 6.