CN114020016A - Air-ground cooperative communication service method and system based on machine learning - Google Patents
Air-ground cooperative communication service method and system based on machine learning Download PDFInfo
- Publication number
- CN114020016A CN114020016A CN202111271084.9A CN202111271084A CN114020016A CN 114020016 A CN114020016 A CN 114020016A CN 202111271084 A CN202111271084 A CN 202111271084A CN 114020016 A CN114020016 A CN 114020016A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- unmanned
- vehicle
- communication service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004891 communication Methods 0.000 title claims abstract description 153
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000010801 machine learning Methods 0.000 title claims abstract description 23
- 238000003062 neural network model Methods 0.000 claims abstract description 28
- 230000006870 function Effects 0.000 claims description 64
- 230000007613 environmental effect Effects 0.000 claims description 31
- 230000008901 benefit Effects 0.000 claims description 30
- 238000012549 training Methods 0.000 claims description 28
- 230000003993 interaction Effects 0.000 claims description 25
- 230000009471 action Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 230000002452 interceptive effect Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000010295 mobile communication Methods 0.000 abstract description 3
- 230000008859 change Effects 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000002787 reinforcement Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000007429 general method Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
An air-ground cooperative communication service method and system based on machine learning relates to the technical field of air-ground cooperative communication service and is used for solving the problems of low service quality and low efficiency caused by the fact that communication service is provided only by an unmanned aerial vehicle in the prior art. The technical points of the invention comprise: acquiring environment information of each unmanned aerial vehicle and unmanned vehicles in communication service; and inputting the environment information into a pre-trained deep neural network model, and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle. The invention can solve the problem of mutual communication between the ground user and the outside or between the ground users after the ground communication base station is damaged, and can solve the problem of insufficient available quantity of mobile communication equipment.
Description
Technical Field
The invention relates to the technical field of air-ground cooperative communication service, in particular to an air-ground cooperative communication service method and system based on machine learning.
Background
The unmanned aerial vehicle mainly has two ways for providing communication service for ground users, wherein the first way is to use the unmanned aerial vehicle as a relay point for communication; in this usage mode, the drone is responsible for forwarding communication messages between the user and the base station, thereby providing communication services to the user. How to optimize the placement of drones and how to assign communication channels to different users in this communication service model needs to be addressed. The general method is to model the above problems as a multi-objective optimization problem, and solve the problem by using a convex optimization method or an intelligent algorithm to obtain the position distribution of the unmanned aerial vehicle and the allocation strategy of the communication channel. The second approach is to provide communication services directly to users using drones; in this mode, the drone acts directly as an over-the-air communications base station, providing direct communications services to the user. How to optimize the dynamic trajectory of the unmanned aerial vehicle in the communication service mode needs to be solved for providing high-quality communication service for users. It is common practice to model the above problem as an optimization problem solution, the considered constraint conditions include energy constraint, maximum data throughput constraint, etc., and the optimization goal is to make the drone provide the best possible communication service for the user. The general methods are convex optimization methods and some machine learning methods.
Although the above approach enables drones to provide high quality communication services to users in some simple environments, there are still some problems that are not addressed. First, in practice, the number of available drones is limited, and the services provided are also limited, so it is difficult to provide high-quality services to users distributed in various environments; secondly, as the communication link can be blocked by the obstacles distributed in the environment, the user blocked by the obstacles is difficult to obtain the service of the unmanned aerial vehicle, and therefore, the unmanned aerial vehicle can not provide high-quality communication service for all users in wide-area and complex environments.
Disclosure of Invention
In view of the above problems, the present invention provides an air-ground cooperative communication service method and system based on machine learning, so as to solve the problems of low service quality and low efficiency caused by only providing communication service by an unmanned aerial vehicle in the prior art.
According to an aspect of the present invention, a method for air-ground cooperative communication service based on machine learning is provided, the method comprising the following steps:
acquiring environment information of each unmanned aerial vehicle and unmanned vehicles in communication service;
and step two, inputting the environmental information into a pre-trained deep neural network model, and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned vehicle.
Further, the environment information corresponding to each unmanned aerial vehicle in the step one comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.
Further, in the first step, the user state information includes a plurality of user position information having a minimum ranking factor with respect to the current unmanned aerial vehicle or unmanned aerial vehicle, the communication average service quality of all users, and the communication service quality standard deviation; the calculation formula of the ranking factor is as follows:
in the formula, ρkRepresenting a ranking factor of user k relative to the drone or drone vehicle; dikRepresents the distance of the drone or drone relative to user k; alpha is alphaikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;indicating the communication service quality of user k at time t; dmax,QmaxIs a normalized coefficient; lambda [ alpha ]1,λ2,λ3Is a scaling factor.
Further, the process of the deep neural network model pre-training in the step two includes:
step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicleAnd target strategyInitializing unmanned aerial vehicles and unmanned vehicle value networkAnd enable policy networks for dronesWith its target networkAre identical, i.e. thatPolicy network enabling unmanned vehicles simultaneouslyWith its target networkAre identical, i.e. that
Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; ot(vj) Representing environmental information observed by the unmanned vehicle j at time t,indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:
in the formula,andrespectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);
step two, repeating the step two to the step two until reaching the set maximum step length T;
step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:
in the formula, LCLIP(θu) And LCLIP(θv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1); r isi t(θu) Is the ratio of the actual strategy to the target strategy of the unmanned aerial vehicle, ri t(θv) The ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;
step two and six, minimizing LCLIP(θu) And LCLIP(θv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:
in the formula, LV(φu) Is the loss value of the unmanned aerial vehicle value function, LV(φv) Loss value of the unmanned vehicle value function;
step two eight, minimize LV(φu) And LV(φv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'u←θu,θ′v←θv;
Twenty, repeating the second step to the second step till the network training is converged to obtain the trained deep neural network model.
Further, the specific process of resolving and obtaining the collaborative communication service policy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle in the step two includes: the output value of the trained deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction; and selecting the unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value as an unmanned aerial vehicle actual control instruction, and selecting the combination of the unmanned vehicle linear speed control instruction and the unmanned vehicle angular speed control instruction corresponding to the maximum probability value as the unmanned vehicle actual control instruction.
According to another aspect of the present invention, a machine learning-based air-ground cooperative communication service system is provided, which includes:
the data acquisition module is used for acquiring the environment information of each unmanned aerial vehicle and the unmanned vehicles in the communication service;
and the instruction resolving module is used for inputting the environmental information into a pre-trained deep neural network model and resolving to obtain a collaborative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle.
Further, the environment information corresponding to each unmanned aerial vehicle in the data acquisition module includes user state information in a communication service area, a plurality of pieces of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of pieces of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.
Further, the user state information in the data acquisition module includes a plurality of user position information having a minimum ranking factor with respect to the current unmanned aerial vehicle or unmanned vehicle, the communication average service quality of all users, and the communication service quality standard deviation; the calculation formula of the ranking factor is as follows:
in the formula, ρkRepresenting a ranking factor of user k relative to the drone or drone vehicle; dikRepresents the distance of the drone or drone relative to user k; alpha is alphaikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;indicating the communication service quality of user k at time t; dmax,QmaxIs a normalized coefficient; lambda [ alpha ]1,λ2,λ3Is a scaling factor.
Further, the instruction resolving module comprises a model training submodule, the model training submodule is used for pre-training the deep neural network model, and the pre-training process comprises the following steps:
step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicleAnd target strategyInitializing unmanned aerial vehicles and unmanned vehicle value networkAnd enable policy networks for dronesWith its target networkAre identical, i.e. thatPolicy network enabling unmanned vehicles simultaneouslyWith its target networkAre identical, i.e. that
Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; ot(vj) Representing environmental information observed by the unmanned vehicle j at time t,indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:
in the formula,andrespectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);
step two, repeating the step two to the step two until reaching the set maximum step length T;
step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:
in the formula, LCLIP(θu) And LCLIP(θv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1); r isi t(θu) Is the ratio of the actual strategy to the target strategy of the unmanned aerial vehicle, ri t(θv) The ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;
step two and six, minimizing LCLIP(θu) And LCLIP(θv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:
in the formula, LV(φu) Is the loss value of the unmanned aerial vehicle value function, LV(φv) Loss value of the unmanned vehicle value function;
step two eight, minimize LV(φu) And LV(φv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'u←θu,θ′v←θv;
Twenty, repeating the second step to the second step till the network training is converged to obtain the trained deep neural network model.
Furthermore, the instruction resolving module further comprises a probability selection submodule, wherein the probability selection submodule is used for selecting an unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value from the trained deep neural network model output values as an unmanned aerial vehicle actual control instruction, and selecting a combination of an unmanned aerial vehicle linear speed control instruction corresponding to the maximum probability value and an unmanned aerial vehicle angular speed control instruction as an unmanned aerial vehicle actual control instruction; the output value of the deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction.
The beneficial technical effects of the invention are as follows:
the invention provides a communication service for ground users by the cooperation of an unmanned aerial vehicle and an unmanned vehicle, which can solve the problem of mutual communication between the ground users and the outside or between the ground users after a ground communication base station is damaged, and can solve the problem of insufficient availability of mobile communication equipment. Compared with the traditional communication service method, the method has the following advantages: 1) the communication service system is provided with a plurality of unmanned aerial vehicles and unmanned vehicles, and can provide high-quality and fair communication service for ground users; 2) by adding unmanned vehicles into the communication service system, the problem of insufficient quantity of available communication service unmanned vehicles can be solved; 3) the cooperative communication service strategy of the unmanned aerial vehicle and the unmanned aerial vehicle is trained by using a deep reinforcement learning method, so that the cooperative communication service strategy can adapt to the change of the environment, has higher robustness and stronger environment adaptability, and can execute communication service tasks in various complex environments; 4) the number of the unmanned aerial vehicles and the number of the unmanned aerial vehicles can be adapted to change, and meanwhile, the number of the ground users can be adapted to change.
Drawings
The present invention may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, and which are used to further illustrate preferred embodiments of the present invention and to explain the principles and advantages of the present invention.
Fig. 1 is a schematic view of a communication service scenario between an unmanned vehicle and an unmanned aerial vehicle in an embodiment of the present invention.
Fig. 2 is a schematic diagram of a deep neural network structure in an embodiment of the present invention.
Fig. 3 is a schematic diagram of a reward value curve obtained in the cooperative strategy training process of the unmanned aerial vehicle and the unmanned aerial vehicle in the embodiment of the invention.
Fig. 4 is a trace graph of a cooperative communication service between an unmanned aerial vehicle and an unmanned aerial vehicle in the embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, exemplary embodiments or examples of the disclosure are described below with reference to the accompanying drawings. It is obvious that the described embodiments or examples are only some, but not all embodiments or examples of the invention. All other embodiments or examples obtained by a person of ordinary skill in the art based on the embodiments or examples of the present invention without any creative effort shall fall within the protection scope of the present invention.
In order to solve the problem of communication service of ground users, the invention provides an air-ground cooperative communication service method based on machine learning.
The embodiment of the invention provides an air-ground cooperative communication service method based on machine learning, which specifically comprises the following steps:
the method comprises the following steps: the unmanned aerial vehicle and the unmanned aerial vehicle acquire environment information in communication service;
an unmanned vehicle and unmanned vehicle communication service scenario is shown in fig. 1. According to the embodiment of the invention, the environment information acquired by the unmanned aerial vehicleComprises three parts of contents, wherein,indicating unmanned plane uiObtaining status information of users in a communication service area, including relative drones u in the communication service areai5 user position information with minimum ranking factor, the position information being included at drone uiDistance d in the course coordinate systemijAngle alphaijJ-1, 2, 9, and the average quality of service of the communication for all usersAnd standard deviation of communication service qualityUser k is relative to unmanned aerial vehicle uiOf the order factor pkThe calculation is shown below:
in the formula (d)ikIndicating unmanned plane uiDistance relative to user k; alpha is alphaikIndicating unmanned plane uiSpeed direction and unmanned plane uiThe included angle between the user k and the connecting line;indicates the channels that user k has at time tThe quality of the trust service; dmax,QmaxTo normalize the coefficient, λ1,λ2,λ3Is a scaling factor.
Unmanned plane u with distance representationiPosition information of the nearest 3 drones, including at drone uiDistance d in the course coordinate systemijAngle alphaijJ is 1,2,3, i.e. Unmanned plane u with distance representationiPosition information of the nearest 3 unmanned vehicles including unmanned vehicle uiDistance and angle under the course coordinate system.
The environmental information of unmanned vehicle perception is similar with the environmental information of unmanned aerial vehicle perception, shows as:also comprises three parts of information; wherein,indicating unmanned vehicle cjPerceived user status, including relatively unmanned vehicle c in a communication service areajPosition information of 5 users with minimum ranking factor, communication average service quality of all usersAnd standard deviation of communication service quality Indicating unmanned vehicle cjThe sensed state of the unmanned aerial vehicle is unmanned aerial vehicle position information;indicating unmanned vehicle cjAnd the sensed states of the other unmanned vehicles are other unmanned vehicle position information.
Step two: inputting environmental information acquired by the unmanned aerial vehicle and the unmanned vehicle into a pre-trained deep neural network model, and resolving to obtain communication service strategy instructions of the unmanned aerial vehicle and the unmanned vehicle;
according to the embodiment of the invention, the deep neural network structure is shown in fig. 2, and comprises a 3-layer fully-connected network, wherein the first layer and the second layer have 128 nodes, the activation function is a nonlinear rectifying unit (ReLU), the third layer has 7 output nodes, and the activation function is a SoftMax function, and the output value is limited between (0, 1).
The deep neural network pre-training process comprises the following steps: collecting interaction data of unmanned aerial vehicle and unmanned vehicle and environment, namely environment information, and estimating advantage function by using the dataAndthe policy loss function L is then calculatedCLIP(θu)、LCLIP(θv) And a loss function L of the value functionV(φu)、LV(φv) And finally, updating the strategy network and the value function network through the loss function of the minimized strategy loss function and the loss function of the value function, thereby obtaining a well-trained deep neural network model. The specific training process is as follows:
(1) communication service strategy for initializing unmanned aerial vehicle and unmanned vehicleAnd target strategyInitializing unmanned aerial vehicles and unmanned vehicle value networkAnd enable policy networks for dronesWith its target networkAre identical, i.e. thatAt the same time, make strategy network of unmanned vehicleWith its target networkAre identical, i.e. that
(2) In each time step, namely an interaction period, the unmanned aerial vehicle and the unmanned aerial vehicle respectively collect interaction data { o ] with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Environmental information, o, representing the observation of unmanned aerial vehicle i at time t +1t(vj) Representing environmental information observed by the unmanned vehicle j at time t,indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
(3) and calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j are calculated as follows:
in the formula,andrespectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j, wherein gamma is a discount factor and is between (0 and 1);
(4) repeating the steps (2) and (3) until the set maximum step length T is reached;
(5) calculating loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy by using the interactive data collected in the steps (2), (3) and (4) and the calculated advantage function as follows:
in the formula, LCLIP(θu) And LCLIP(θv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle, wherein the epsilon is a constant, and the value range is (0, 1);clip is a function, clip (r)i t(θu) 1-e, 1+ e) represents the sum of ri t(θu) Is limited to [ 1-e, 1+ e]To (c) to (d); r isi t(θu) Is the ratio of the actual strategy to the target strategy of the unmanned aerial vehicle, ri t(θv) Respectively calculating the ratio of the actual strategy to the target strategy of the unmanned vehicle as follows:
(6) minimization of LCLIP(θu) And LCLIP(θv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
(7) calculating the loss values of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the steps (2), (3) and (4) as follows:
in the formula, LV(φu) Is the loss value of the unmanned aerial vehicle value function, LV(φv) Loss value of the unmanned vehicle value function;
(8) minimization of LV(φu) And LV(φv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
(9) updating unmanned aerial vehicle target strategy network and unmanned aerial vehicle target strategy network theta'u←θu;θ′v←θv;
(10) And (5) repeating the steps (2) to (9) until the network training is converged, and obtaining a trained deep neural network model.
In the pre-training process, unmanned plane uiThe prize value achieved may be represented by the following equation:
rt(ui)=rt Q(ui)+rt S(ui)+rt R(ui)
in the formula, the first term rt Q(ui) In relation to the communication service quality of the user, r is when the user has a higher average communication service quality and a lower variance of the communication service qualityt Q(ui) Is large; otherwise rt Q(ui) Is smaller. Second term rt S(ui) With unmanned plane uiIn relation to the distances of other drones and other unmanned vehicles, r is the distance between drones and between drone and unmanned vehicle when the distance is smallt S(ui) Is a negative value; otherwise rt S(ui) Is 0. Third term rt R(ui) With unmanned plane uiRelative to the location of the communication service environment, when the drone uiWhile in the communication service area, rt R(ui) Is 0; otherwise rt R(ui) Is negative.
The design process of the reward function of the unmanned vehicle is the same as that of the reward function of the unmanned vehicle. The communication service strategy of the unmanned aerial vehicle and the unmanned vehicle is trained in a deep reinforcement learning training mode, the unmanned aerial vehicle and the unmanned vehicle learn an effective cooperative communication service strategy through continuous interaction with the environment, high-quality and fair communication service can be provided for ground users, and pseudo codes of a specific implementation process are shown in the following table 1.
The unmanned aerial vehicle and the unmanned vehicle environment information acquired in real time are subjected to a trained deep neural network model, and the output value of the model comprises the probability of selecting a control instruction of each unmanned aerial vehicleAnd selecting each ofProbability of man-vehicle control commandWherein the unmanned aerial vehicle control command is a course declination command of the unmanned aerial vehicle, namelyDegree; the unmanned vehicle control instruction is a linear speed control instruction of the unmanned vehicleAnd angular velocity control commandIn combination, i.e.WhereinFinally, fromSelecting the course deflection angle with the maximum probability as an actual control instruction of the unmanned aerial vehicle, and selecting the course deflection angle with the maximum probability as the actual control instruction of the unmanned aerial vehicleThe linear speed and angular speed combination with the maximum probability is selected as the actual control command of the unmanned vehicle.
The beneficial effects of the invention are further verified through experiments.
The correctness and the rationality of the invention are verified by adopting a digital simulation mode. Firstly, a communication service environment with the size of 500m × 500m × 150m is constructed in a Python environment, and the communication service environment comprises 10 users and a dynamic communication service system consisting of a plurality of unmanned aerial vehicles and unmanned vehicles. The unmanned aerial vehicle flies at a constant speed and a constant height, the flying speed is 10m/s, the maximum speed of the unmanned aerial vehicle is 10m/s, the maximum moving speed of a user is 1m/s, and the unmanned aerial vehicle moves randomly in a communication service area. The simulation test software environment is Windows10+ Python3.7, and the hardware environment is AMD Ryzen 53550H CPU +16.0 GBRAM.
The experiment first verifies whether the communication service control strategy training of the unmanned aerial vehicle and the unmanned aerial vehicle is convergent. 10000 training rounds are performed in the experiment, the average reward value obtained by the unmanned aerial vehicle and the unmanned aerial vehicle in each 100 training rounds is recorded, and a curve is drawn as shown in fig. 3. As can be seen from fig. 3, as the training progresses, the drone and the unmanned vehicle can obtain a stable reward value, which is between 6.5 and 7, indicating that the communication service strategies of the drone and the unmanned vehicle approach convergence, and the drone and the unmanned vehicle can provide high-quality and fair communication service for the user.
And then carrying out experimental verification on the cooperation strategy of the unmanned aerial vehicle and the unmanned vehicle, wherein the verification result is shown in figure 4. As can be seen from fig. 4, the unmanned aerial vehicle and the unmanned vehicle can provide communication services for different users respectively, and the provided communication services are relatively uniform, that is, the unmanned aerial vehicle and the unmanned vehicle can cooperate to provide fair communication services for users on the ground.
The invention provides a communication service for ground users by the cooperation of an unmanned aerial vehicle and an unmanned vehicle, and can solve the problem of mutual communication between the ground users and the outside or the ground users after disasters or damages of ground communication base stations. Meanwhile, the unmanned aerial vehicle and the unmanned vehicle can solve the problem that available mobile communication equipment is not enough in cooperation, and the advantages of communication services of the unmanned aerial vehicle and the unmanned vehicle are brought into play. Compared with the traditional communication service strategy, the air-ground cooperative communication service strategy based on learning provided by the invention has the following advantages: 1) the communication service system is provided with a plurality of unmanned aerial vehicles and unmanned vehicles, and can provide high-quality and fair communication service for ground users. 2) By adding unmanned vehicles into the communication service system, the problem of insufficient quantity of available communication service unmanned aerial vehicles can be solved. 3) The cooperative communication service strategy of the unmanned aerial vehicle and the unmanned aerial vehicle is trained by using the deep reinforcement learning method, so that the cooperative communication service strategy can adapt to the change of the environment, has higher robustness and stronger environment adaptability, and can execute communication service tasks in various complex environments. The unmanned aerial vehicle and unmanned vehicle air-ground cooperative communication service strategy provided by the invention can adapt to the change of the number of unmanned aerial vehicles and unmanned vehicles and can adapt to the change of the number of ground users. The method can realize the cooperation of the unmanned aerial vehicle and the unmanned vehicle to provide high-quality and fair communication service for ground users, and provides a new technical approach for providing a post-disaster user communication service.
Another embodiment of the present invention provides an air-ground cooperative communication service system based on machine learning, including:
the data acquisition module is used for acquiring the environment information of each unmanned aerial vehicle and the unmanned vehicles in the communication service; the environment information corresponding to each unmanned aerial vehicle comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; the position information comprises a distance parameter and an angle parameter; the user state information comprises a plurality of user position information with the minimum ranking factor relative to the current unmanned aerial vehicle or unmanned aerial vehicle, the communication average service quality of all users and the standard deviation of the communication service quality; the calculation formula of the ranking factor is as follows:
in the formula, ρkRepresenting a ranking factor of user k relative to the drone or drone vehicle; dikRepresents the distance of the drone or drone relative to user k; alpha is alphaikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;indicating the communication service quality of user k at time t; dmax,QmaxIs a normalized coefficient; lambda [ alpha ]1,λ2,λ3Is a proportionality coefficient;
the instruction resolving module is used for inputting the environment information into a pre-trained deep neural network model and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle; the method comprises a model training submodule and a probability selection submodule;
the model training submodule is used for pre-training the deep neural network model, and the pre-training process comprises the following steps:
step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicleAnd target strategyInitializing unmanned aerial vehicles and unmanned vehicle value networkAnd enable policy networks for dronesWith its target networkAre identical, i.e. thatPolicy network enabling unmanned vehicles simultaneouslyWith its target networkAre identical, i.e. that
Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; ot(vj) Representing environmental information observed by the unmanned vehicle j at time t,indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:
in the formula,andrespectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);
step two, repeating the step two to the step two until reaching the set maximum step length T;
step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:
in the formula, LCLIP(θu) And LCLIP(θv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1); r isi t(θu) Is the ratio of the actual strategy to the target strategy of the unmanned aerial vehicle, ri t(θv) The ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;
step two and six, minimizing LCLIP(θu) And LCLIP(θv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:
in the formula, LV(φu) Is the loss value of the unmanned aerial vehicle value function, LV(φv) Loss value of the unmanned vehicle value function;
step two eight, minimize LV(φu) And LV(φv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'u←θu,θ′v←θv;
Twenty, repeating the second step to the second step until the network training is converged to obtain a trained deep neural network model;
the probability selection submodule is used for selecting the unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value from the trained deep neural network model output values as an unmanned aerial vehicle actual control instruction, and selecting the combination of the unmanned aerial vehicle linear speed control instruction corresponding to the maximum probability value and the unmanned aerial vehicle angular speed control instruction as an unmanned aerial vehicle actual control instruction; the output value of the deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction.
The functions of the air-ground cooperative communication service system based on machine learning according to the embodiment of the present invention can be described by the aforementioned air-ground cooperative communication service method based on machine learning, so that the detailed description of the embodiment is omitted, and reference may be made to the above method embodiments, which are not described herein again.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.
Claims (10)
1. A method for air-ground cooperative communication service based on machine learning is characterized by comprising the following steps:
acquiring environment information of each unmanned aerial vehicle and unmanned vehicles in communication service;
and step two, inputting the environmental information into a pre-trained deep neural network model, and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned vehicle.
2. The air-ground cooperative communication service method based on machine learning according to claim 1, wherein the environment information corresponding to each unmanned aerial vehicle in the step one comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.
3. The air-ground cooperative communication service method based on machine learning of claim 2, wherein the user status information in step one comprises a plurality of user location information with minimum ranking factor relative to the current unmanned aerial vehicle or unmanned vehicle, communication average service quality of all users and communication service quality standard deviation; the calculation formula of the ranking factor is as follows:
in the formula, ρkRepresenting a ranking factor of user k relative to the drone or drone vehicle; dikRepresents the distance of the drone or drone relative to user k; alpha is alphaikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;indicating the communication service quality of user k at time t; dmax,QmaxIs a normalized coefficient; lambda [ alpha ]1,λ2,λ3Is a scaling factor.
4. The air-ground cooperative communication service method based on machine learning according to claim 3, wherein the process of deep neural network model pre-training in the second step comprises:
step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicleAnd target strategyInitializing unmanned aerial vehicles and unmanned vehicle value networkAnd enable policy networks for dronesWith its target networkAre identical, i.e. thatPolicy network enabling unmanned vehicles simultaneouslyWith its target networkAre identical, i.e. that
Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; ot(vj) Representing environmental information observed by the unmanned vehicle j at time t,indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:
in the formula,andrespectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);
step two, repeating the step two to the step two until reaching the set maximum step length T;
step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:
in the formula, LCLIP(θu) And LCLIP(θv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1);is the ratio of the actual strategy to the target strategy of the unmanned plane,the ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;
step two and six, minimizing LCLIP(θu) And LCLIP(θv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:
in the formula, LV(φu) Is the loss value of the unmanned aerial vehicle value function, LV(φv) Loss value of the unmanned vehicle value function;
step two eight, minimize LV(φu) And LV(φv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'u←θu,θ′v←θv;
Twenty, repeating the second step to the second step till the network training is converged to obtain the trained deep neural network model.
5. The air-ground cooperative communication service method based on machine learning according to claim 4, wherein the specific process of obtaining cooperative communication service strategy instructions of the unmanned aerial vehicle and the unmanned vehicle by solving in the step two comprises: the output value of the trained deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction; and selecting the unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value as an unmanned aerial vehicle actual control instruction, and selecting the combination of the unmanned vehicle linear speed control instruction and the unmanned vehicle angular speed control instruction corresponding to the maximum probability value as the unmanned vehicle actual control instruction.
6. An air-ground cooperative communication service system based on machine learning, comprising:
the data acquisition module is used for acquiring the environment information of each unmanned aerial vehicle and the unmanned vehicles in the communication service;
and the instruction resolving module is used for inputting the environmental information into a pre-trained deep neural network model and resolving to obtain a collaborative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle.
7. The air-ground cooperative communication service system based on machine learning of claim 6, wherein the environment information corresponding to each unmanned aerial vehicle in the data acquisition module comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.
8. The air-ground cooperative communication service system based on machine learning of claim 7, wherein the user status information in the data acquisition module comprises a plurality of user position information with minimum ranking factor relative to the current unmanned aerial vehicle or unmanned aerial vehicle, communication average service quality of all users and communication service quality standard deviation; the calculation formula of the ranking factor is as follows:
in the formula, ρkRepresenting a ranking factor of user k relative to the drone or drone vehicle; dikRepresents the distance of the drone or drone relative to user k; alpha is alphaikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;indicating the communication service quality of user k at time t; dmax,QmaxIs a normalized coefficient; lambda [ alpha ]1,λ2,λ3Is a scaling factor.
9. The air-ground cooperative communication service system based on machine learning of claim 8, wherein the instruction resolving module comprises a model training submodule for pre-training a deep neural network model, and the pre-training process comprises:
step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicleAnd target strategyInitializing unmanned aerial vehicles and unmanned vehicle value networkAnd enable policy networks for dronesWith its target networkAre identical, i.e. thatPolicy network enabling unmanned vehicles simultaneouslyWith its target networkAre identical, i.e. that
Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environmentt(ui),at(ui),rt+1(ui),ot+1(ui) And { o }t(vj),at(vj),rt+1(vj),ot+1(vj) In which o ist(ui) Representing the environmental information observed by drone i at time t,representing the action command executed by the unmanned aerial vehicle i at time t, rt+1(ui) Indicating the prize value, o, received by drone i at time t +1t+1(ui) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; ot(vj) Representing environmental information observed by the unmanned vehicle j at time t,indicates the action command, r, executed by the unmanned vehicle j at the time tt+1(vj) Indicating the reward value, o, received by the unmanned vehicle j at time t +1t+1(vj) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;
step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:
in the formula,andrespectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);
step two, repeating the step two to the step two until reaching the set maximum step length T;
step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:
in the formula, LCLIP(θu) And LCLIP(θv) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1);is the ratio of the actual strategy to the target strategy of the unmanned plane,the ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;
step two and six, minimizing LCLIP(θu) And LCLIP(θv) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;
step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:
in the formula, LV(φu) Is the loss value of the unmanned aerial vehicle value function, LV(φv) Loss value of the unmanned vehicle value function;
step two eight, minimize LV(φu) And LV(φv) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;
step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'u←θu,θ′v←θv;
Twenty, repeating the second step to the second step till the network training is converged to obtain the trained deep neural network model.
10. The air-ground cooperative communication service system based on machine learning of claim 9, wherein the instruction resolving module further comprises a probability selection sub-module, the probability selection sub-module is configured to select an unmanned aerial vehicle heading deflection angle instruction corresponding to a maximum probability value from the trained deep neural network model output values as an unmanned aerial vehicle actual control instruction, and select a combination of an unmanned aerial vehicle linear velocity control instruction corresponding to the maximum probability value and an unmanned aerial vehicle angular velocity control instruction as the unmanned aerial vehicle actual control instruction; the output value of the deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111271084.9A CN114020016B (en) | 2021-10-29 | 2021-10-29 | Air-ground cooperative communication service method and system based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111271084.9A CN114020016B (en) | 2021-10-29 | 2021-10-29 | Air-ground cooperative communication service method and system based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114020016A true CN114020016A (en) | 2022-02-08 |
CN114020016B CN114020016B (en) | 2022-06-21 |
Family
ID=80058717
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111271084.9A Active CN114020016B (en) | 2021-10-29 | 2021-10-29 | Air-ground cooperative communication service method and system based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114020016B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229685A (en) * | 2016-12-14 | 2018-06-29 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of unmanned Intelligent Decision-making Method of vacant lot one |
CN110650039A (en) * | 2019-09-17 | 2020-01-03 | 沈阳航空航天大学 | Multimodal optimization-based network collaborative communication model for unmanned aerial vehicle cluster-assisted vehicle |
CN110874578A (en) * | 2019-11-15 | 2020-03-10 | 北京航空航天大学青岛研究院 | Unmanned aerial vehicle visual angle vehicle identification and tracking method based on reinforcement learning |
CN111300372A (en) * | 2020-04-02 | 2020-06-19 | 同济人工智能研究院(苏州)有限公司 | Air-ground cooperative intelligent inspection robot and inspection method |
CN111628818A (en) * | 2020-05-15 | 2020-09-04 | 哈尔滨工业大学 | Distributed real-time communication method and device for air-ground unmanned system and multi-unmanned system |
CN112068549A (en) * | 2020-08-07 | 2020-12-11 | 哈尔滨工业大学 | Unmanned system cluster control method based on deep reinforcement learning |
CN112965514A (en) * | 2021-01-29 | 2021-06-15 | 北京农业智能装备技术研究中心 | Air-ground cooperative pesticide application method and system |
CN113029169A (en) * | 2021-03-03 | 2021-06-25 | 宁夏大学 | Air-ground cooperative search and rescue system and method based on three-dimensional map and autonomous navigation |
CN113050678A (en) * | 2021-03-02 | 2021-06-29 | 山东罗滨逊物流有限公司 | Autonomous cooperative control method and system based on artificial intelligence |
CN113160554A (en) * | 2021-02-02 | 2021-07-23 | 上海大学 | Air-ground cooperative traffic management system and method based on Internet of vehicles |
-
2021
- 2021-10-29 CN CN202111271084.9A patent/CN114020016B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229685A (en) * | 2016-12-14 | 2018-06-29 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of unmanned Intelligent Decision-making Method of vacant lot one |
CN110650039A (en) * | 2019-09-17 | 2020-01-03 | 沈阳航空航天大学 | Multimodal optimization-based network collaborative communication model for unmanned aerial vehicle cluster-assisted vehicle |
CN110874578A (en) * | 2019-11-15 | 2020-03-10 | 北京航空航天大学青岛研究院 | Unmanned aerial vehicle visual angle vehicle identification and tracking method based on reinforcement learning |
CN111300372A (en) * | 2020-04-02 | 2020-06-19 | 同济人工智能研究院(苏州)有限公司 | Air-ground cooperative intelligent inspection robot and inspection method |
CN111628818A (en) * | 2020-05-15 | 2020-09-04 | 哈尔滨工业大学 | Distributed real-time communication method and device for air-ground unmanned system and multi-unmanned system |
CN112068549A (en) * | 2020-08-07 | 2020-12-11 | 哈尔滨工业大学 | Unmanned system cluster control method based on deep reinforcement learning |
CN112965514A (en) * | 2021-01-29 | 2021-06-15 | 北京农业智能装备技术研究中心 | Air-ground cooperative pesticide application method and system |
CN113160554A (en) * | 2021-02-02 | 2021-07-23 | 上海大学 | Air-ground cooperative traffic management system and method based on Internet of vehicles |
CN113050678A (en) * | 2021-03-02 | 2021-06-29 | 山东罗滨逊物流有限公司 | Autonomous cooperative control method and system based on artificial intelligence |
CN113029169A (en) * | 2021-03-03 | 2021-06-25 | 宁夏大学 | Air-ground cooperative search and rescue system and method based on three-dimensional map and autonomous navigation |
Non-Patent Citations (2)
Title |
---|
周思全等: "面向空地协同作战的无人机-无人车异构时变编队跟踪控制", 《航空兵器》 * |
徐文菁: "非确定环境下无人机与无人车动态协同设计", 《洛阳理工学院学报( 自然科学版)》 * |
Also Published As
Publication number | Publication date |
---|---|
CN114020016B (en) | 2022-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bayerlein et al. | UAV path planning for wireless data harvesting: A deep reinforcement learning approach | |
CN109547938B (en) | Trajectory planning method for unmanned aerial vehicle in wireless sensor network | |
CN110049566B (en) | Downlink power distribution method based on multi-unmanned-aerial-vehicle auxiliary communication network | |
CN105841702A (en) | Method for planning routes of multi-unmanned aerial vehicles based on particle swarm optimization algorithm | |
Dai et al. | Mobile crowdsensing for data freshness: A deep reinforcement learning approach | |
CN115278729B (en) | Unmanned plane cooperation data collection and data unloading method in ocean Internet of things | |
CN115499921A (en) | Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network | |
CN116974751A (en) | Task scheduling method based on multi-agent auxiliary edge cloud server | |
CN113055078A (en) | Effective information age determination method and unmanned aerial vehicle flight trajectory optimization method | |
CN117289691A (en) | Training method for path planning agent for reinforcement learning in navigation scene | |
CN107786989B (en) | Lora intelligent water meter network gateway deployment method and device | |
Du et al. | Virtual relay selection in LTE-V: A deep reinforcement learning approach to heterogeneous data | |
Chen et al. | A fast coordination approach for large-scale drone swarm | |
CN114020016B (en) | Air-ground cooperative communication service method and system based on machine learning | |
CN114895710A (en) | Control method and system for autonomous behavior of unmanned aerial vehicle cluster | |
Cui et al. | Model-free based automated trajectory optimization for UAVs toward data transmission | |
Zeng et al. | The study of DDPG based spatiotemporal dynamic deployment optimization of Air-Ground ad hoc network for disaster emergency response | |
CN115809751B (en) | Two-stage multi-robot environment coverage method and system based on reinforcement learning | |
Zhang et al. | Trajectory design for UAV-based inspection system: A deep reinforcement learning approach | |
CN111880568A (en) | Optimization training method, device and equipment for automatic control of unmanned aerial vehicle and storage medium | |
CN114520991B (en) | Unmanned aerial vehicle cluster-based edge network self-adaptive deployment method | |
Bhandarkar et al. | User coverage maximization for a uav-mounted base station using reinforcement learning and greedy methods | |
CN113741418B (en) | Method and device for generating cooperative paths of heterogeneous vehicle and machine formation | |
CN114594793A (en) | Path planning method for base station unmanned aerial vehicle | |
CN113919188B (en) | Relay unmanned aerial vehicle path planning method based on context-MAB |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |