CN114020016A - Air-ground cooperative communication service method and system based on machine learning - Google Patents

Air-ground cooperative communication service method and system based on machine learning Download PDF

Info

Publication number
CN114020016A
CN114020016A CN202111271084.9A CN202111271084A CN114020016A CN 114020016 A CN114020016 A CN 114020016A CN 202111271084 A CN202111271084 A CN 202111271084A CN 114020016 A CN114020016 A CN 114020016A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
unmanned
vehicle
communication service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111271084.9A
Other languages
Chinese (zh)
Other versions
CN114020016B (en
Inventor
白成超
郭继峰
颜鹏
郑红星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202111271084.9A priority Critical patent/CN114020016B/en
Publication of CN114020016A publication Critical patent/CN114020016A/en
Application granted granted Critical
Publication of CN114020016B publication Critical patent/CN114020016B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

An air-ground cooperative communication service method and system based on machine learning relates to the technical field of air-ground cooperative communication service and is used for solving the problems of low service quality and low efficiency caused by the fact that communication service is provided only by an unmanned aerial vehicle in the prior art. The technical points of the invention comprise: acquiring environment information of each unmanned aerial vehicle and unmanned vehicles in communication service; and inputting the environment information into a pre-trained deep neural network model, and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle. The invention can solve the problem of mutual communication between the ground user and the outside or between the ground users after the ground communication base station is damaged, and can solve the problem of insufficient available quantity of mobile communication equipment.

Description

Air-ground cooperative communication service method and system based on machine learning
Technical Field
The invention relates to the technical field of air-ground cooperative communication service, in particular to an air-ground cooperative communication service method and system based on machine learning.
Background
The unmanned aerial vehicle mainly has two ways for providing communication service for ground users, wherein the first way is to use the unmanned aerial vehicle as a relay point for communication; in this usage mode, the drone is responsible for forwarding communication messages between the user and the base station, thereby providing communication services to the user. How to optimize the placement of drones and how to assign communication channels to different users in this communication service model needs to be addressed. The general method is to model the above problems as a multi-objective optimization problem, and solve the problem by using a convex optimization method or an intelligent algorithm to obtain the position distribution of the unmanned aerial vehicle and the allocation strategy of the communication channel. The second approach is to provide communication services directly to users using drones; in this mode, the drone acts directly as an over-the-air communications base station, providing direct communications services to the user. How to optimize the dynamic trajectory of the unmanned aerial vehicle in the communication service mode needs to be solved for providing high-quality communication service for users. It is common practice to model the above problem as an optimization problem solution, the considered constraint conditions include energy constraint, maximum data throughput constraint, etc., and the optimization goal is to make the drone provide the best possible communication service for the user. The general methods are convex optimization methods and some machine learning methods.
Although the above approach enables drones to provide high quality communication services to users in some simple environments, there are still some problems that are not addressed. First, in practice, the number of available drones is limited, and the services provided are also limited, so it is difficult to provide high-quality services to users distributed in various environments; secondly, as the communication link can be blocked by the obstacles distributed in the environment, the user blocked by the obstacles is difficult to obtain the service of the unmanned aerial vehicle, and therefore, the unmanned aerial vehicle can not provide high-quality communication service for all users in wide-area and complex environments.
Disclosure of Invention
In view of the above problems, the present invention provides an air-ground cooperative communication service method and system based on machine learning, so as to solve the problems of low service quality and low efficiency caused by only providing communication service by an unmanned aerial vehicle in the prior art.
According to an aspect of the present invention, a method for air-ground cooperative communication service based on machine learning is provided, the method comprising the following steps:
acquiring environment information of each unmanned aerial vehicle and unmanned vehicles in communication service;
and step two, inputting the environmental information into a pre-trained deep neural network model, and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned vehicle.
Further, the environment information corresponding to each unmanned aerial vehicle in the step one comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.
Further, in the first step, the user state information includes a plurality of user position information having a minimum ranking factor with respect to the current unmanned aerial vehicle or unmanned aerial vehicle, the communication average service quality of all users, and the communication service quality standard deviation; the calculation formula of the ranking factor is as follows:
Figure BDA0003328007250000021
in the formula, ρkRepresenting a ranking factor of user k relative to the drone or drone vehicle; dikRepresents the distance of the drone or drone relative to user k; alpha is alphaikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;
Figure BDA00033280072500000215
indicating the communication service quality of user k at time t; dmax,QmaxIs a normalized coefficient; lambda [ alpha ]123Is a scaling factor.
Further, the process of the deep neural network model pre-training in the step two includes:
step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicle
Figure BDA0003328007250000022
And target strategy
Figure BDA0003328007250000023
Initializing unmanned aerial vehicles and unmanned vehicle value network
Figure BDA0003328007250000024
And enable policy networks for drones
Figure BDA0003328007250000025
With its target network
Figure BDA0003328007250000026
Are identical, i.e. that
Figure BDA0003328007250000027
Policy network enabling unmanned vehicles simultaneously
Figure BDA0003328007250000028
With its target network
Figure BDA0003328007250000029
Are identical, i.e. that
Figure BDA00033280072500000210
Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,
Figure BDA00033280072500000211
representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; ot(vj) Representing environmental information observed by the unmanned vehicle j at time t,
Figure BDA00033280072500000212
indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:
Figure BDA00033280072500000213
Figure BDA00033280072500000214
in the formula,
Figure BDA0003328007250000031
and
Figure BDA0003328007250000032
respectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);
step two, repeating the step two to the step two until reaching the set maximum step length T;
step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:
Figure BDA0003328007250000033
Figure BDA0003328007250000034
in the formula, LCLIPu) And LCLIPv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1); r isi tu) Is the ratio of the actual strategy to the target strategy of the unmanned aerial vehicle, ri tv) The ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;
step two and six, minimizing LCLIPu) And LCLIPv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:
Figure BDA0003328007250000035
in the formula, LVu) Is the loss value of the unmanned aerial vehicle value function, LVv) Loss value of the unmanned vehicle value function;
step two eight, minimize LVu) And LVv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'u←θu,θ′v←θv
Twenty, repeating the second step to the second step till the network training is converged to obtain the trained deep neural network model.
Further, the specific process of resolving and obtaining the collaborative communication service policy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle in the step two includes: the output value of the trained deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction; and selecting the unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value as an unmanned aerial vehicle actual control instruction, and selecting the combination of the unmanned vehicle linear speed control instruction and the unmanned vehicle angular speed control instruction corresponding to the maximum probability value as the unmanned vehicle actual control instruction.
According to another aspect of the present invention, a machine learning-based air-ground cooperative communication service system is provided, which includes:
the data acquisition module is used for acquiring the environment information of each unmanned aerial vehicle and the unmanned vehicles in the communication service;
and the instruction resolving module is used for inputting the environmental information into a pre-trained deep neural network model and resolving to obtain a collaborative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle.
Further, the environment information corresponding to each unmanned aerial vehicle in the data acquisition module includes user state information in a communication service area, a plurality of pieces of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of pieces of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.
Further, the user state information in the data acquisition module includes a plurality of user position information having a minimum ranking factor with respect to the current unmanned aerial vehicle or unmanned vehicle, the communication average service quality of all users, and the communication service quality standard deviation; the calculation formula of the ranking factor is as follows:
Figure BDA0003328007250000041
in the formula, ρkRepresenting a ranking factor of user k relative to the drone or drone vehicle; dikRepresents the distance of the drone or drone relative to user k; alpha is alphaikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;
Figure BDA0003328007250000042
indicating the communication service quality of user k at time t; dmax,QmaxIs a normalized coefficient; lambda [ alpha ]123Is a scaling factor.
Further, the instruction resolving module comprises a model training submodule, the model training submodule is used for pre-training the deep neural network model, and the pre-training process comprises the following steps:
step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicle
Figure BDA0003328007250000043
And target strategy
Figure BDA0003328007250000044
Initializing unmanned aerial vehicles and unmanned vehicle value network
Figure BDA0003328007250000045
And enable policy networks for drones
Figure BDA0003328007250000046
With its target network
Figure BDA0003328007250000047
Are identical, i.e. that
Figure BDA0003328007250000048
Policy network enabling unmanned vehicles simultaneously
Figure BDA0003328007250000049
With its target network
Figure BDA00033280072500000410
Are identical, i.e. that
Figure BDA00033280072500000411
Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,
Figure BDA00033280072500000412
representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; ot(vj) Representing environmental information observed by the unmanned vehicle j at time t,
Figure BDA00033280072500000413
indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:
Figure BDA0003328007250000051
Figure BDA0003328007250000052
in the formula,
Figure BDA0003328007250000053
and
Figure BDA0003328007250000054
respectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);
step two, repeating the step two to the step two until reaching the set maximum step length T;
step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:
Figure BDA0003328007250000055
Figure BDA0003328007250000056
in the formula, LCLIPu) And LCLIPv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1); r isi tu) Is the ratio of the actual strategy to the target strategy of the unmanned aerial vehicle, ri tv) The ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;
step two and six, minimizing LCLIPu) And LCLIPv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:
Figure BDA0003328007250000057
in the formula, LVu) Is the loss value of the unmanned aerial vehicle value function, LVv) Loss value of the unmanned vehicle value function;
step two eight, minimize LVu) And LVv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'u←θu,θ′v←θv
Twenty, repeating the second step to the second step till the network training is converged to obtain the trained deep neural network model.
Furthermore, the instruction resolving module further comprises a probability selection submodule, wherein the probability selection submodule is used for selecting an unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value from the trained deep neural network model output values as an unmanned aerial vehicle actual control instruction, and selecting a combination of an unmanned aerial vehicle linear speed control instruction corresponding to the maximum probability value and an unmanned aerial vehicle angular speed control instruction as an unmanned aerial vehicle actual control instruction; the output value of the deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction.
The beneficial technical effects of the invention are as follows:
the invention provides a communication service for ground users by the cooperation of an unmanned aerial vehicle and an unmanned vehicle, which can solve the problem of mutual communication between the ground users and the outside or between the ground users after a ground communication base station is damaged, and can solve the problem of insufficient availability of mobile communication equipment. Compared with the traditional communication service method, the method has the following advantages: 1) the communication service system is provided with a plurality of unmanned aerial vehicles and unmanned vehicles, and can provide high-quality and fair communication service for ground users; 2) by adding unmanned vehicles into the communication service system, the problem of insufficient quantity of available communication service unmanned vehicles can be solved; 3) the cooperative communication service strategy of the unmanned aerial vehicle and the unmanned aerial vehicle is trained by using a deep reinforcement learning method, so that the cooperative communication service strategy can adapt to the change of the environment, has higher robustness and stronger environment adaptability, and can execute communication service tasks in various complex environments; 4) the number of the unmanned aerial vehicles and the number of the unmanned aerial vehicles can be adapted to change, and meanwhile, the number of the ground users can be adapted to change.
Drawings
The present invention may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, and which are used to further illustrate preferred embodiments of the present invention and to explain the principles and advantages of the present invention.
Fig. 1 is a schematic view of a communication service scenario between an unmanned vehicle and an unmanned aerial vehicle in an embodiment of the present invention.
Fig. 2 is a schematic diagram of a deep neural network structure in an embodiment of the present invention.
Fig. 3 is a schematic diagram of a reward value curve obtained in the cooperative strategy training process of the unmanned aerial vehicle and the unmanned aerial vehicle in the embodiment of the invention.
Fig. 4 is a trace graph of a cooperative communication service between an unmanned aerial vehicle and an unmanned aerial vehicle in the embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, exemplary embodiments or examples of the disclosure are described below with reference to the accompanying drawings. It is obvious that the described embodiments or examples are only some, but not all embodiments or examples of the invention. All other embodiments or examples obtained by a person of ordinary skill in the art based on the embodiments or examples of the present invention without any creative effort shall fall within the protection scope of the present invention.
In order to solve the problem of communication service of ground users, the invention provides an air-ground cooperative communication service method based on machine learning.
The embodiment of the invention provides an air-ground cooperative communication service method based on machine learning, which specifically comprises the following steps:
the method comprises the following steps: the unmanned aerial vehicle and the unmanned aerial vehicle acquire environment information in communication service;
an unmanned vehicle and unmanned vehicle communication service scenario is shown in fig. 1. According to the embodiment of the invention, the environment information acquired by the unmanned aerial vehicle
Figure BDA0003328007250000071
Comprises three parts of contents, wherein,
Figure BDA0003328007250000072
indicating unmanned plane uiObtaining status information of users in a communication service area, including relative drones u in the communication service areai5 user position information with minimum ranking factor, the position information being included at drone uiDistance d in the course coordinate systemijAngle alphaijJ-1, 2, 9, and the average quality of service of the communication for all users
Figure BDA0003328007250000073
And standard deviation of communication service quality
Figure BDA0003328007250000074
User k is relative to unmanned aerial vehicle uiOf the order factor pkThe calculation is shown below:
Figure BDA0003328007250000075
in the formula (d)ikIndicating unmanned plane uiDistance relative to user k; alpha is alphaikIndicating unmanned plane uiSpeed direction and unmanned plane uiThe included angle between the user k and the connecting line;
Figure BDA0003328007250000076
indicates the channels that user k has at time tThe quality of the trust service; dmax,QmaxTo normalize the coefficient, λ123Is a scaling factor.
Figure BDA0003328007250000077
Unmanned plane u with distance representationiPosition information of the nearest 3 drones, including at drone uiDistance d in the course coordinate systemijAngle alphaijJ is 1,2,3, i.e.
Figure BDA0003328007250000078
Figure BDA0003328007250000079
Unmanned plane u with distance representationiPosition information of the nearest 3 unmanned vehicles including unmanned vehicle uiDistance and angle under the course coordinate system.
The environmental information of unmanned vehicle perception is similar with the environmental information of unmanned aerial vehicle perception, shows as:
Figure BDA00033280072500000710
also comprises three parts of information; wherein,
Figure BDA00033280072500000711
indicating unmanned vehicle cjPerceived user status, including relatively unmanned vehicle c in a communication service areajPosition information of 5 users with minimum ranking factor, communication average service quality of all users
Figure BDA00033280072500000712
And standard deviation of communication service quality
Figure BDA00033280072500000713
Figure BDA00033280072500000714
Indicating unmanned vehicle cjThe sensed state of the unmanned aerial vehicle is unmanned aerial vehicle position information;
Figure BDA00033280072500000715
indicating unmanned vehicle cjAnd the sensed states of the other unmanned vehicles are other unmanned vehicle position information.
Step two: inputting environmental information acquired by the unmanned aerial vehicle and the unmanned vehicle into a pre-trained deep neural network model, and resolving to obtain communication service strategy instructions of the unmanned aerial vehicle and the unmanned vehicle;
according to the embodiment of the invention, the deep neural network structure is shown in fig. 2, and comprises a 3-layer fully-connected network, wherein the first layer and the second layer have 128 nodes, the activation function is a nonlinear rectifying unit (ReLU), the third layer has 7 output nodes, and the activation function is a SoftMax function, and the output value is limited between (0, 1).
The deep neural network pre-training process comprises the following steps: collecting interaction data of unmanned aerial vehicle and unmanned vehicle and environment, namely environment information, and estimating advantage function by using the data
Figure BDA00033280072500000716
And
Figure BDA00033280072500000717
the policy loss function L is then calculatedCLIPu)、LCLIPv) And a loss function L of the value functionVu)、LVv) And finally, updating the strategy network and the value function network through the loss function of the minimized strategy loss function and the loss function of the value function, thereby obtaining a well-trained deep neural network model. The specific training process is as follows:
(1) communication service strategy for initializing unmanned aerial vehicle and unmanned vehicle
Figure BDA0003328007250000081
And target strategy
Figure BDA0003328007250000082
Initializing unmanned aerial vehicles and unmanned vehicle value network
Figure BDA0003328007250000083
And enable policy networks for drones
Figure BDA0003328007250000084
With its target network
Figure BDA0003328007250000085
Are identical, i.e. that
Figure BDA0003328007250000086
At the same time, make strategy network of unmanned vehicle
Figure BDA0003328007250000087
With its target network
Figure BDA0003328007250000088
Are identical, i.e. that
Figure BDA0003328007250000089
(2) In each time step, namely an interaction period, the unmanned aerial vehicle and the unmanned aerial vehicle respectively collect interaction data { o ] with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,
Figure BDA00033280072500000810
representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Environmental information, o, representing the observation of unmanned aerial vehicle i at time t +1t(vj) Representing environmental information observed by the unmanned vehicle j at time t,
Figure BDA00033280072500000811
indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
(3) and calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j are calculated as follows:
Figure BDA00033280072500000812
Figure BDA00033280072500000813
in the formula,
Figure BDA00033280072500000814
and
Figure BDA00033280072500000815
respectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j, wherein gamma is a discount factor and is between (0 and 1);
(4) repeating the steps (2) and (3) until the set maximum step length T is reached;
(5) calculating loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy by using the interactive data collected in the steps (2), (3) and (4) and the calculated advantage function as follows:
Figure BDA00033280072500000816
Figure BDA00033280072500000817
in the formula, LCLIPu) And LCLIPv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle, wherein the epsilon is a constant, and the value range is (0, 1);clip is a function, clip (r)i tu) 1-e, 1+ e) represents the sum of ri tu) Is limited to [ 1-e, 1+ e]To (c) to (d); r isi tu) Is the ratio of the actual strategy to the target strategy of the unmanned aerial vehicle, ri tv) Respectively calculating the ratio of the actual strategy to the target strategy of the unmanned vehicle as follows:
Figure BDA0003328007250000091
(6) minimization of LCLIPu) And LCLIPv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
(7) calculating the loss values of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the steps (2), (3) and (4) as follows:
Figure BDA0003328007250000092
in the formula, LVu) Is the loss value of the unmanned aerial vehicle value function, LVv) Loss value of the unmanned vehicle value function;
(8) minimization of LVu) And LVv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
(9) updating unmanned aerial vehicle target strategy network and unmanned aerial vehicle target strategy network theta'u←θu;θ′v←θv
(10) And (5) repeating the steps (2) to (9) until the network training is converged, and obtaining a trained deep neural network model.
In the pre-training process, unmanned plane uiThe prize value achieved may be represented by the following equation:
rt(ui)=rt Q(ui)+rt S(ui)+rt R(ui)
in the formula, the first term rt Q(ui) In relation to the communication service quality of the user, r is when the user has a higher average communication service quality and a lower variance of the communication service qualityt Q(ui) Is large; otherwise rt Q(ui) Is smaller. Second term rt S(ui) With unmanned plane uiIn relation to the distances of other drones and other unmanned vehicles, r is the distance between drones and between drone and unmanned vehicle when the distance is smallt S(ui) Is a negative value; otherwise rt S(ui) Is 0. Third term rt R(ui) With unmanned plane uiRelative to the location of the communication service environment, when the drone uiWhile in the communication service area, rt R(ui) Is 0; otherwise rt R(ui) Is negative.
The design process of the reward function of the unmanned vehicle is the same as that of the reward function of the unmanned vehicle. The communication service strategy of the unmanned aerial vehicle and the unmanned vehicle is trained in a deep reinforcement learning training mode, the unmanned aerial vehicle and the unmanned vehicle learn an effective cooperative communication service strategy through continuous interaction with the environment, high-quality and fair communication service can be provided for ground users, and pseudo codes of a specific implementation process are shown in the following table 1.
Figure BDA0003328007250000093
Figure BDA0003328007250000101
The unmanned aerial vehicle and the unmanned vehicle environment information acquired in real time are subjected to a trained deep neural network model, and the output value of the model comprises the probability of selecting a control instruction of each unmanned aerial vehicle
Figure BDA0003328007250000102
And selecting each ofProbability of man-vehicle control command
Figure BDA0003328007250000103
Wherein the unmanned aerial vehicle control command is a course declination command of the unmanned aerial vehicle, namely
Figure BDA0003328007250000104
Degree; the unmanned vehicle control instruction is a linear speed control instruction of the unmanned vehicle
Figure BDA0003328007250000105
And angular velocity control command
Figure BDA0003328007250000106
In combination, i.e.
Figure BDA0003328007250000107
Wherein
Figure BDA0003328007250000108
Finally, from
Figure BDA0003328007250000109
Selecting the course deflection angle with the maximum probability as an actual control instruction of the unmanned aerial vehicle, and selecting the course deflection angle with the maximum probability as the actual control instruction of the unmanned aerial vehicle
Figure BDA00033280072500001010
The linear speed and angular speed combination with the maximum probability is selected as the actual control command of the unmanned vehicle.
The beneficial effects of the invention are further verified through experiments.
The correctness and the rationality of the invention are verified by adopting a digital simulation mode. Firstly, a communication service environment with the size of 500m × 500m × 150m is constructed in a Python environment, and the communication service environment comprises 10 users and a dynamic communication service system consisting of a plurality of unmanned aerial vehicles and unmanned vehicles. The unmanned aerial vehicle flies at a constant speed and a constant height, the flying speed is 10m/s, the maximum speed of the unmanned aerial vehicle is 10m/s, the maximum moving speed of a user is 1m/s, and the unmanned aerial vehicle moves randomly in a communication service area. The simulation test software environment is Windows10+ Python3.7, and the hardware environment is AMD Ryzen 53550H CPU +16.0 GBRAM.
The experiment first verifies whether the communication service control strategy training of the unmanned aerial vehicle and the unmanned aerial vehicle is convergent. 10000 training rounds are performed in the experiment, the average reward value obtained by the unmanned aerial vehicle and the unmanned aerial vehicle in each 100 training rounds is recorded, and a curve is drawn as shown in fig. 3. As can be seen from fig. 3, as the training progresses, the drone and the unmanned vehicle can obtain a stable reward value, which is between 6.5 and 7, indicating that the communication service strategies of the drone and the unmanned vehicle approach convergence, and the drone and the unmanned vehicle can provide high-quality and fair communication service for the user.
And then carrying out experimental verification on the cooperation strategy of the unmanned aerial vehicle and the unmanned vehicle, wherein the verification result is shown in figure 4. As can be seen from fig. 4, the unmanned aerial vehicle and the unmanned vehicle can provide communication services for different users respectively, and the provided communication services are relatively uniform, that is, the unmanned aerial vehicle and the unmanned vehicle can cooperate to provide fair communication services for users on the ground.
The invention provides a communication service for ground users by the cooperation of an unmanned aerial vehicle and an unmanned vehicle, and can solve the problem of mutual communication between the ground users and the outside or the ground users after disasters or damages of ground communication base stations. Meanwhile, the unmanned aerial vehicle and the unmanned vehicle can solve the problem that available mobile communication equipment is not enough in cooperation, and the advantages of communication services of the unmanned aerial vehicle and the unmanned vehicle are brought into play. Compared with the traditional communication service strategy, the air-ground cooperative communication service strategy based on learning provided by the invention has the following advantages: 1) the communication service system is provided with a plurality of unmanned aerial vehicles and unmanned vehicles, and can provide high-quality and fair communication service for ground users. 2) By adding unmanned vehicles into the communication service system, the problem of insufficient quantity of available communication service unmanned aerial vehicles can be solved. 3) The cooperative communication service strategy of the unmanned aerial vehicle and the unmanned aerial vehicle is trained by using the deep reinforcement learning method, so that the cooperative communication service strategy can adapt to the change of the environment, has higher robustness and stronger environment adaptability, and can execute communication service tasks in various complex environments. The unmanned aerial vehicle and unmanned vehicle air-ground cooperative communication service strategy provided by the invention can adapt to the change of the number of unmanned aerial vehicles and unmanned vehicles and can adapt to the change of the number of ground users. The method can realize the cooperation of the unmanned aerial vehicle and the unmanned vehicle to provide high-quality and fair communication service for ground users, and provides a new technical approach for providing a post-disaster user communication service.
Another embodiment of the present invention provides an air-ground cooperative communication service system based on machine learning, including:
the data acquisition module is used for acquiring the environment information of each unmanned aerial vehicle and the unmanned vehicles in the communication service; the environment information corresponding to each unmanned aerial vehicle comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; the position information comprises a distance parameter and an angle parameter; the user state information comprises a plurality of user position information with the minimum ranking factor relative to the current unmanned aerial vehicle or unmanned aerial vehicle, the communication average service quality of all users and the standard deviation of the communication service quality; the calculation formula of the ranking factor is as follows:
Figure BDA0003328007250000111
in the formula, ρkRepresenting a ranking factor of user k relative to the drone or drone vehicle; dikRepresents the distance of the drone or drone relative to user k; alpha is alphaikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;
Figure BDA0003328007250000112
indicating the communication service quality of user k at time t; dmax,QmaxIs a normalized coefficient; lambda [ alpha ]123Is a proportionality coefficient;
the instruction resolving module is used for inputting the environment information into a pre-trained deep neural network model and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle; the method comprises a model training submodule and a probability selection submodule;
the model training submodule is used for pre-training the deep neural network model, and the pre-training process comprises the following steps:
step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicle
Figure BDA0003328007250000121
And target strategy
Figure BDA0003328007250000122
Initializing unmanned aerial vehicles and unmanned vehicle value network
Figure BDA0003328007250000123
And enable policy networks for drones
Figure BDA0003328007250000124
With its target network
Figure BDA0003328007250000125
Are identical, i.e. that
Figure BDA0003328007250000126
Policy network enabling unmanned vehicles simultaneously
Figure BDA0003328007250000127
With its target network
Figure BDA0003328007250000128
Are identical, i.e. that
Figure BDA0003328007250000129
Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,
Figure BDA00033280072500001210
representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; ot(vj) Representing environmental information observed by the unmanned vehicle j at time t,
Figure BDA00033280072500001211
indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:
Figure BDA00033280072500001212
Figure BDA00033280072500001213
in the formula,
Figure BDA00033280072500001214
and
Figure BDA00033280072500001215
respectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);
step two, repeating the step two to the step two until reaching the set maximum step length T;
step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:
Figure BDA00033280072500001216
Figure BDA00033280072500001217
in the formula, LCLIPu) And LCLIPv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1); r isi tu) Is the ratio of the actual strategy to the target strategy of the unmanned aerial vehicle, ri tv) The ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;
step two and six, minimizing LCLIPu) And LCLIPv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:
Figure BDA0003328007250000131
in the formula, LVu) Is the loss value of the unmanned aerial vehicle value function, LVv) Loss value of the unmanned vehicle value function;
step two eight, minimize LVu) And LVv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'u←θu,θ′v←θv
Twenty, repeating the second step to the second step until the network training is converged to obtain a trained deep neural network model;
the probability selection submodule is used for selecting the unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value from the trained deep neural network model output values as an unmanned aerial vehicle actual control instruction, and selecting the combination of the unmanned aerial vehicle linear speed control instruction corresponding to the maximum probability value and the unmanned aerial vehicle angular speed control instruction as an unmanned aerial vehicle actual control instruction; the output value of the deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction.
The functions of the air-ground cooperative communication service system based on machine learning according to the embodiment of the present invention can be described by the aforementioned air-ground cooperative communication service method based on machine learning, so that the detailed description of the embodiment is omitted, and reference may be made to the above method embodiments, which are not described herein again.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (10)

1. A method for air-ground cooperative communication service based on machine learning is characterized by comprising the following steps:
acquiring environment information of each unmanned aerial vehicle and unmanned vehicles in communication service;
and step two, inputting the environmental information into a pre-trained deep neural network model, and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned vehicle.
2. The air-ground cooperative communication service method based on machine learning according to claim 1, wherein the environment information corresponding to each unmanned aerial vehicle in the step one comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.
3. The air-ground cooperative communication service method based on machine learning of claim 2, wherein the user status information in step one comprises a plurality of user location information with minimum ranking factor relative to the current unmanned aerial vehicle or unmanned vehicle, communication average service quality of all users and communication service quality standard deviation; the calculation formula of the ranking factor is as follows:
Figure FDA0003328007240000011
in the formula, ρkRepresenting a ranking factor of user k relative to the drone or drone vehicle; dikRepresents the distance of the drone or drone relative to user k; alpha is alphaikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;
Figure FDA0003328007240000012
indicating the communication service quality of user k at time t; dmax,QmaxIs a normalized coefficient; lambda [ alpha ]123Is a scaling factor.
4. The air-ground cooperative communication service method based on machine learning according to claim 3, wherein the process of deep neural network model pre-training in the second step comprises:
step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicle
Figure FDA0003328007240000013
And target strategy
Figure FDA0003328007240000014
Initializing unmanned aerial vehicles and unmanned vehicle value network
Figure FDA0003328007240000015
And enable policy networks for drones
Figure FDA0003328007240000016
With its target network
Figure FDA0003328007240000017
Are identical, i.e. that
Figure FDA0003328007240000018
Policy network enabling unmanned vehicles simultaneously
Figure FDA0003328007240000019
With its target network
Figure FDA00033280072400000110
Are identical, i.e. that
Figure FDA00033280072400000111
Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,
Figure FDA00033280072400000112
representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; ot(vj) Representing environmental information observed by the unmanned vehicle j at time t,
Figure FDA00033280072400000113
indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:
Figure FDA0003328007240000021
Figure FDA0003328007240000022
in the formula,
Figure FDA0003328007240000023
and
Figure FDA0003328007240000024
respectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);
step two, repeating the step two to the step two until reaching the set maximum step length T;
step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:
Figure FDA0003328007240000025
Figure FDA0003328007240000026
in the formula, LCLIPu) And LCLIPv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1);
Figure FDA0003328007240000027
is the ratio of the actual strategy to the target strategy of the unmanned plane,
Figure FDA0003328007240000028
the ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;
step two and six, minimizing LCLIPu) And LCLIPv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:
Figure FDA0003328007240000029
in the formula, LVu) Is the loss value of the unmanned aerial vehicle value function, LVv) Loss value of the unmanned vehicle value function;
step two eight, minimize LVu) And LVv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'u←θu,θ′v←θv
Twenty, repeating the second step to the second step till the network training is converged to obtain the trained deep neural network model.
5. The air-ground cooperative communication service method based on machine learning according to claim 4, wherein the specific process of obtaining cooperative communication service strategy instructions of the unmanned aerial vehicle and the unmanned vehicle by solving in the step two comprises: the output value of the trained deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction; and selecting the unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value as an unmanned aerial vehicle actual control instruction, and selecting the combination of the unmanned vehicle linear speed control instruction and the unmanned vehicle angular speed control instruction corresponding to the maximum probability value as the unmanned vehicle actual control instruction.
6. An air-ground cooperative communication service system based on machine learning, comprising:
the data acquisition module is used for acquiring the environment information of each unmanned aerial vehicle and the unmanned vehicles in the communication service;
and the instruction resolving module is used for inputting the environmental information into a pre-trained deep neural network model and resolving to obtain a collaborative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle.
7. The air-ground cooperative communication service system based on machine learning of claim 6, wherein the environment information corresponding to each unmanned aerial vehicle in the data acquisition module comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.
8. The air-ground cooperative communication service system based on machine learning of claim 7, wherein the user status information in the data acquisition module comprises a plurality of user position information with minimum ranking factor relative to the current unmanned aerial vehicle or unmanned aerial vehicle, communication average service quality of all users and communication service quality standard deviation; the calculation formula of the ranking factor is as follows:
Figure FDA0003328007240000031
in the formula, ρkRepresenting a ranking factor of user k relative to the drone or drone vehicle; dikRepresents the distance of the drone or drone relative to user k; alpha is alphaikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;
Figure FDA0003328007240000032
indicating the communication service quality of user k at time t; dmax,QmaxIs a normalized coefficient; lambda [ alpha ]123Is a scaling factor.
9. The air-ground cooperative communication service system based on machine learning of claim 8, wherein the instruction resolving module comprises a model training submodule for pre-training a deep neural network model, and the pre-training process comprises:
step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicle
Figure FDA0003328007240000033
And target strategy
Figure FDA0003328007240000034
Initializing unmanned aerial vehicles and unmanned vehicle value network
Figure FDA0003328007240000035
And enable policy networks for drones
Figure FDA0003328007240000036
With its target network
Figure FDA0003328007240000037
Are identical, i.e. that
Figure FDA0003328007240000041
Policy network enabling unmanned vehicles simultaneously
Figure FDA0003328007240000042
With its target network
Figure FDA0003328007240000043
Are identical, i.e. that
Figure FDA0003328007240000044
Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,
Figure FDA0003328007240000045
representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; ot(vj) Representing environmental information observed by the unmanned vehicle j at time t,
Figure FDA0003328007240000046
indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:
Figure FDA0003328007240000047
Figure FDA0003328007240000048
in the formula,
Figure FDA0003328007240000049
and
Figure FDA00033280072400000410
respectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);
step two, repeating the step two to the step two until reaching the set maximum step length T;
step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:
Figure FDA00033280072400000411
Figure FDA00033280072400000412
in the formula, LCLIPu) And LCLIPv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1);
Figure FDA00033280072400000413
is the ratio of the actual strategy to the target strategy of the unmanned plane,
Figure FDA00033280072400000414
the ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;
step two and six, minimizing LCLIPu) And LCLIPv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:
Figure FDA00033280072400000415
in the formula, LVu) Is the loss value of the unmanned aerial vehicle value function, LVv) Loss value of the unmanned vehicle value function;
step two eight, minimize LVu) And LVv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'u←θu,θ′v←θv
Twenty, repeating the second step to the second step till the network training is converged to obtain the trained deep neural network model.
10. The air-ground cooperative communication service system based on machine learning of claim 9, wherein the instruction resolving module further comprises a probability selection sub-module, the probability selection sub-module is configured to select an unmanned aerial vehicle heading deflection angle instruction corresponding to a maximum probability value from the trained deep neural network model output values as an unmanned aerial vehicle actual control instruction, and select a combination of an unmanned aerial vehicle linear velocity control instruction corresponding to the maximum probability value and an unmanned aerial vehicle angular velocity control instruction as the unmanned aerial vehicle actual control instruction; the output value of the deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction.
CN202111271084.9A 2021-10-29 2021-10-29 Air-ground cooperative communication service method and system based on machine learning Active CN114020016B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111271084.9A CN114020016B (en) 2021-10-29 2021-10-29 Air-ground cooperative communication service method and system based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111271084.9A CN114020016B (en) 2021-10-29 2021-10-29 Air-ground cooperative communication service method and system based on machine learning

Publications (2)

Publication Number Publication Date
CN114020016A true CN114020016A (en) 2022-02-08
CN114020016B CN114020016B (en) 2022-06-21

Family

ID=80058717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111271084.9A Active CN114020016B (en) 2021-10-29 2021-10-29 Air-ground cooperative communication service method and system based on machine learning

Country Status (1)

Country Link
CN (1) CN114020016B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229685A (en) * 2016-12-14 2018-06-29 中国航空工业集团公司西安航空计算技术研究所 A kind of unmanned Intelligent Decision-making Method of vacant lot one
CN110650039A (en) * 2019-09-17 2020-01-03 沈阳航空航天大学 Multimodal optimization-based network collaborative communication model for unmanned aerial vehicle cluster-assisted vehicle
CN110874578A (en) * 2019-11-15 2020-03-10 北京航空航天大学青岛研究院 Unmanned aerial vehicle visual angle vehicle identification and tracking method based on reinforcement learning
CN111300372A (en) * 2020-04-02 2020-06-19 同济人工智能研究院(苏州)有限公司 Air-ground cooperative intelligent inspection robot and inspection method
CN111628818A (en) * 2020-05-15 2020-09-04 哈尔滨工业大学 Distributed real-time communication method and device for air-ground unmanned system and multi-unmanned system
CN112068549A (en) * 2020-08-07 2020-12-11 哈尔滨工业大学 Unmanned system cluster control method based on deep reinforcement learning
CN112965514A (en) * 2021-01-29 2021-06-15 北京农业智能装备技术研究中心 Air-ground cooperative pesticide application method and system
CN113029169A (en) * 2021-03-03 2021-06-25 宁夏大学 Air-ground cooperative search and rescue system and method based on three-dimensional map and autonomous navigation
CN113050678A (en) * 2021-03-02 2021-06-29 山东罗滨逊物流有限公司 Autonomous cooperative control method and system based on artificial intelligence
CN113160554A (en) * 2021-02-02 2021-07-23 上海大学 Air-ground cooperative traffic management system and method based on Internet of vehicles

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229685A (en) * 2016-12-14 2018-06-29 中国航空工业集团公司西安航空计算技术研究所 A kind of unmanned Intelligent Decision-making Method of vacant lot one
CN110650039A (en) * 2019-09-17 2020-01-03 沈阳航空航天大学 Multimodal optimization-based network collaborative communication model for unmanned aerial vehicle cluster-assisted vehicle
CN110874578A (en) * 2019-11-15 2020-03-10 北京航空航天大学青岛研究院 Unmanned aerial vehicle visual angle vehicle identification and tracking method based on reinforcement learning
CN111300372A (en) * 2020-04-02 2020-06-19 同济人工智能研究院(苏州)有限公司 Air-ground cooperative intelligent inspection robot and inspection method
CN111628818A (en) * 2020-05-15 2020-09-04 哈尔滨工业大学 Distributed real-time communication method and device for air-ground unmanned system and multi-unmanned system
CN112068549A (en) * 2020-08-07 2020-12-11 哈尔滨工业大学 Unmanned system cluster control method based on deep reinforcement learning
CN112965514A (en) * 2021-01-29 2021-06-15 北京农业智能装备技术研究中心 Air-ground cooperative pesticide application method and system
CN113160554A (en) * 2021-02-02 2021-07-23 上海大学 Air-ground cooperative traffic management system and method based on Internet of vehicles
CN113050678A (en) * 2021-03-02 2021-06-29 山东罗滨逊物流有限公司 Autonomous cooperative control method and system based on artificial intelligence
CN113029169A (en) * 2021-03-03 2021-06-25 宁夏大学 Air-ground cooperative search and rescue system and method based on three-dimensional map and autonomous navigation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周思全等: "面向空地协同作战的无人机-无人车异构时变编队跟踪控制", 《航空兵器》 *
徐文菁: "非确定环境下无人机与无人车动态协同设计", 《洛阳理工学院学报( 自然科学版)》 *

Also Published As

Publication number Publication date
CN114020016B (en) 2022-06-21

Similar Documents

Publication Publication Date Title
Bayerlein et al. UAV path planning for wireless data harvesting: A deep reinforcement learning approach
CN109547938B (en) Trajectory planning method for unmanned aerial vehicle in wireless sensor network
CN110049566B (en) Downlink power distribution method based on multi-unmanned-aerial-vehicle auxiliary communication network
CN105841702A (en) Method for planning routes of multi-unmanned aerial vehicles based on particle swarm optimization algorithm
Dai et al. Mobile crowdsensing for data freshness: A deep reinforcement learning approach
CN115278729B (en) Unmanned plane cooperation data collection and data unloading method in ocean Internet of things
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
CN116974751A (en) Task scheduling method based on multi-agent auxiliary edge cloud server
CN113055078A (en) Effective information age determination method and unmanned aerial vehicle flight trajectory optimization method
CN117289691A (en) Training method for path planning agent for reinforcement learning in navigation scene
CN107786989B (en) Lora intelligent water meter network gateway deployment method and device
Du et al. Virtual relay selection in LTE-V: A deep reinforcement learning approach to heterogeneous data
Chen et al. A fast coordination approach for large-scale drone swarm
CN114020016B (en) Air-ground cooperative communication service method and system based on machine learning
CN114895710A (en) Control method and system for autonomous behavior of unmanned aerial vehicle cluster
Cui et al. Model-free based automated trajectory optimization for UAVs toward data transmission
Zeng et al. The study of DDPG based spatiotemporal dynamic deployment optimization of Air-Ground ad hoc network for disaster emergency response
CN115809751B (en) Two-stage multi-robot environment coverage method and system based on reinforcement learning
Zhang et al. Trajectory design for UAV-based inspection system: A deep reinforcement learning approach
CN111880568A (en) Optimization training method, device and equipment for automatic control of unmanned aerial vehicle and storage medium
CN114520991B (en) Unmanned aerial vehicle cluster-based edge network self-adaptive deployment method
Bhandarkar et al. User coverage maximization for a uav-mounted base station using reinforcement learning and greedy methods
CN113741418B (en) Method and device for generating cooperative paths of heterogeneous vehicle and machine formation
CN114594793A (en) Path planning method for base station unmanned aerial vehicle
CN113919188B (en) Relay unmanned aerial vehicle path planning method based on context-MAB

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant