CN114020016A

CN114020016A - Air-ground cooperative communication service method and system based on machine learning

Info

Publication number: CN114020016A
Application number: CN202111271084.9A
Authority: CN
Inventors: 白成超; 郭继峰; 颜鹏; 郑红星
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-02-08
Anticipated expiration: 2041-10-29
Also published as: CN114020016B

Abstract

An air-ground cooperative communication service method and system based on machine learning relates to the technical field of air-ground cooperative communication service and is used for solving the problems of low service quality and low efficiency caused by the fact that communication service is provided only by an unmanned aerial vehicle in the prior art. The technical points of the invention comprise: acquiring environment information of each unmanned aerial vehicle and unmanned vehicles in communication service; and inputting the environment information into a pre-trained deep neural network model, and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle. The invention can solve the problem of mutual communication between the ground user and the outside or between the ground users after the ground communication base station is damaged, and can solve the problem of insufficient available quantity of mobile communication equipment.

Description

Air-ground cooperative communication service method and system based on machine learning

Technical Field

The invention relates to the technical field of air-ground cooperative communication service, in particular to an air-ground cooperative communication service method and system based on machine learning.

Background

The unmanned aerial vehicle mainly has two ways for providing communication service for ground users, wherein the first way is to use the unmanned aerial vehicle as a relay point for communication; in this usage mode, the drone is responsible for forwarding communication messages between the user and the base station, thereby providing communication services to the user. How to optimize the placement of drones and how to assign communication channels to different users in this communication service model needs to be addressed. The general method is to model the above problems as a multi-objective optimization problem, and solve the problem by using a convex optimization method or an intelligent algorithm to obtain the position distribution of the unmanned aerial vehicle and the allocation strategy of the communication channel. The second approach is to provide communication services directly to users using drones; in this mode, the drone acts directly as an over-the-air communications base station, providing direct communications services to the user. How to optimize the dynamic trajectory of the unmanned aerial vehicle in the communication service mode needs to be solved for providing high-quality communication service for users. It is common practice to model the above problem as an optimization problem solution, the considered constraint conditions include energy constraint, maximum data throughput constraint, etc., and the optimization goal is to make the drone provide the best possible communication service for the user. The general methods are convex optimization methods and some machine learning methods.

Although the above approach enables drones to provide high quality communication services to users in some simple environments, there are still some problems that are not addressed. First, in practice, the number of available drones is limited, and the services provided are also limited, so it is difficult to provide high-quality services to users distributed in various environments; secondly, as the communication link can be blocked by the obstacles distributed in the environment, the user blocked by the obstacles is difficult to obtain the service of the unmanned aerial vehicle, and therefore, the unmanned aerial vehicle can not provide high-quality communication service for all users in wide-area and complex environments.

Disclosure of Invention

In view of the above problems, the present invention provides an air-ground cooperative communication service method and system based on machine learning, so as to solve the problems of low service quality and low efficiency caused by only providing communication service by an unmanned aerial vehicle in the prior art.

According to an aspect of the present invention, a method for air-ground cooperative communication service based on machine learning is provided, the method comprising the following steps:

acquiring environment information of each unmanned aerial vehicle and unmanned vehicles in communication service;

and step two, inputting the environmental information into a pre-trained deep neural network model, and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned vehicle.

Further, the environment information corresponding to each unmanned aerial vehicle in the step one comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.

Further, in the first step, the user state information includes a plurality of user position information having a minimum ranking factor with respect to the current unmanned aerial vehicle or unmanned aerial vehicle, the communication average service quality of all users, and the communication service quality standard deviation; the calculation formula of the ranking factor is as follows:

in the formula, ρ_kRepresenting a ranking factor of user k relative to the drone or drone vehicle; d_ikRepresents the distance of the drone or drone relative to user k; alpha is alpha_ikRepresenting the included angle between the speed direction of the unmanned aerial vehicle or the unmanned vehicle and a connecting line between the unmanned aerial vehicle or the unmanned vehicle and a user k;

indicating the communication service quality of user k at time t; d_max,Q_maxIs a normalized coefficient; lambda [ alpha ]₁,λ₂,λ₃Is a scaling factor.

Further, the process of the deep neural network model pre-training in the step two includes:

step two, initializing communication service strategies of unmanned aerial vehicle and unmanned aerial vehicle

And target strategy

Initializing unmanned aerial vehicles and unmanned vehicle value network

And enable policy networks for drones

With its target network

Are identical, i.e. that

Policy network enabling unmanned vehicles simultaneously

With its target network

Are identical, i.e. that

Step two, in each interaction period, the unmanned aerial vehicle and the unmanned vehicle respectively collect interaction data { o } with the environment_t(u_i),a_t(u_i),r_t+1(u_i),o_t+1(u_i) And { o }_t(v_j),a_t(v_j),r_t+1(v_j),o_t+1(v_j) In which o is_t(u_i) Representing the environmental information observed by drone i at time t,

representing the action command executed by the unmanned aerial vehicle i at time t, r_t+1(u_i) Indicating the prize value, o, received by drone i at time t +1_t+1(u_i) Representing environmental information observed by the unmanned aerial vehicle i at the moment t + 1; o_t(v_j) Representing environmental information observed by the unmanned vehicle j at time t,

indicates the action command, r, executed by the unmanned vehicle j at the time t_t+1(v_j) Indicating the reward value, o, received by the unmanned vehicle j at time t +1_t+1(v_j) Representing the environmental information observed by the unmanned vehicle j at the moment t + 1;

step two, calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j are calculated as follows:

in the formula,

and

respectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j; γ is a discount factor, between (0, 1);

step two, repeating the step two to the step two until reaching the set maximum step length T;

step two, calculating by using the interaction data collected in the step and the calculated advantage function to obtain the loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy as follows:

in the formula, L^CLIP(θ_u) And L^CLIP(θ_v) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1); r is_i ^t(θ_u) Is the ratio of the actual strategy to the target strategy of the unmanned aerial vehicle, r_i ^t(θ_v) The ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;

step two and six, minimizing L^CLIP(θ_u) And L^CLIP(θ_v) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;

step two, calculating the loss value of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the step two as follows:

in the formula, L^V(φ_u) Is the loss value of the unmanned aerial vehicle value function, L^V(φ_v) Loss value of the unmanned vehicle value function;

step two eight, minimize L^V(φ_u) And L^V(φ_v) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;

step two, updating the unmanned aerial vehicle target strategy network and the unmanned aerial vehicle target strategy network: theta'_u←θ^u，θ′_v←θ^v；

Twenty, repeating the second step to the second step till the network training is converged to obtain the trained deep neural network model.

Further, the specific process of resolving and obtaining the collaborative communication service policy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle in the step two includes: the output value of the trained deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction; and selecting the unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value as an unmanned aerial vehicle actual control instruction, and selecting the combination of the unmanned vehicle linear speed control instruction and the unmanned vehicle angular speed control instruction corresponding to the maximum probability value as the unmanned vehicle actual control instruction.

According to another aspect of the present invention, a machine learning-based air-ground cooperative communication service system is provided, which includes:

the data acquisition module is used for acquiring the environment information of each unmanned aerial vehicle and the unmanned vehicles in the communication service;

and the instruction resolving module is used for inputting the environmental information into a pre-trained deep neural network model and resolving to obtain a collaborative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle.

Further, the environment information corresponding to each unmanned aerial vehicle in the data acquisition module includes user state information in a communication service area, a plurality of pieces of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of pieces of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.

Further, the user state information in the data acquisition module includes a plurality of user position information having a minimum ranking factor with respect to the current unmanned aerial vehicle or unmanned vehicle, the communication average service quality of all users, and the communication service quality standard deviation; the calculation formula of the ranking factor is as follows:

Further, the instruction resolving module comprises a model training submodule, the model training submodule is used for pre-training the deep neural network model, and the pre-training process comprises the following steps:

And target strategy

Initializing unmanned aerial vehicles and unmanned vehicle value network

And enable policy networks for drones

With its target network

Are identical, i.e. that

Policy network enabling unmanned vehicles simultaneously

With its target network

Are identical, i.e. that

in the formula,

and

Furthermore, the instruction resolving module further comprises a probability selection submodule, wherein the probability selection submodule is used for selecting an unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value from the trained deep neural network model output values as an unmanned aerial vehicle actual control instruction, and selecting a combination of an unmanned aerial vehicle linear speed control instruction corresponding to the maximum probability value and an unmanned aerial vehicle angular speed control instruction as an unmanned aerial vehicle actual control instruction; the output value of the deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction.

The beneficial technical effects of the invention are as follows:

the invention provides a communication service for ground users by the cooperation of an unmanned aerial vehicle and an unmanned vehicle, which can solve the problem of mutual communication between the ground users and the outside or between the ground users after a ground communication base station is damaged, and can solve the problem of insufficient availability of mobile communication equipment. Compared with the traditional communication service method, the method has the following advantages: 1) the communication service system is provided with a plurality of unmanned aerial vehicles and unmanned vehicles, and can provide high-quality and fair communication service for ground users; 2) by adding unmanned vehicles into the communication service system, the problem of insufficient quantity of available communication service unmanned vehicles can be solved; 3) the cooperative communication service strategy of the unmanned aerial vehicle and the unmanned aerial vehicle is trained by using a deep reinforcement learning method, so that the cooperative communication service strategy can adapt to the change of the environment, has higher robustness and stronger environment adaptability, and can execute communication service tasks in various complex environments; 4) the number of the unmanned aerial vehicles and the number of the unmanned aerial vehicles can be adapted to change, and meanwhile, the number of the ground users can be adapted to change.

Drawings

The present invention may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, and which are used to further illustrate preferred embodiments of the present invention and to explain the principles and advantages of the present invention.

Fig. 1 is a schematic view of a communication service scenario between an unmanned vehicle and an unmanned aerial vehicle in an embodiment of the present invention.

Fig. 2 is a schematic diagram of a deep neural network structure in an embodiment of the present invention.

Fig. 3 is a schematic diagram of a reward value curve obtained in the cooperative strategy training process of the unmanned aerial vehicle and the unmanned aerial vehicle in the embodiment of the invention.

Fig. 4 is a trace graph of a cooperative communication service between an unmanned aerial vehicle and an unmanned aerial vehicle in the embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the disclosure, exemplary embodiments or examples of the disclosure are described below with reference to the accompanying drawings. It is obvious that the described embodiments or examples are only some, but not all embodiments or examples of the invention. All other embodiments or examples obtained by a person of ordinary skill in the art based on the embodiments or examples of the present invention without any creative effort shall fall within the protection scope of the present invention.

In order to solve the problem of communication service of ground users, the invention provides an air-ground cooperative communication service method based on machine learning.

The embodiment of the invention provides an air-ground cooperative communication service method based on machine learning, which specifically comprises the following steps:

the method comprises the following steps: the unmanned aerial vehicle and the unmanned aerial vehicle acquire environment information in communication service;

an unmanned vehicle and unmanned vehicle communication service scenario is shown in fig. 1. According to the embodiment of the invention, the environment information acquired by the unmanned aerial vehicle

Comprises three parts of contents, wherein,

indicating unmanned plane u_iObtaining status information of users in a communication service area, including relative drones u in the communication service area_i5 user position information with minimum ranking factor, the position information being included at drone u_iDistance d in the course coordinate system_ijAngle alpha_ijJ-1, 2, 9, and the average quality of service of the communication for all users

And standard deviation of communication service quality

User k is relative to unmanned aerial vehicle u_iOf the order factor p_kThe calculation is shown below:

in the formula (d)_ikIndicating unmanned plane u_iDistance relative to user k; alpha is alpha_ikIndicating unmanned plane u_iSpeed direction and unmanned plane u_iThe included angle between the user k and the connecting line;

indicates the channels that user k has at time tThe quality of the trust service; d_max,Q_maxTo normalize the coefficient, λ₁,λ₂,λ₃Is a scaling factor.

Unmanned plane u with distance representation_iPosition information of the nearest 3 drones, including at drone u_iDistance d in the course coordinate system_ijAngle alpha_ijJ is 1,2,3, i.e.

Unmanned plane u with distance representation_iPosition information of the nearest 3 unmanned vehicles including unmanned vehicle u_iDistance and angle under the course coordinate system.

The environmental information of unmanned vehicle perception is similar with the environmental information of unmanned aerial vehicle perception, shows as:

also comprises three parts of information; wherein,

indicating unmanned vehicle c_jPerceived user status, including relatively unmanned vehicle c in a communication service area_jPosition information of 5 users with minimum ranking factor, communication average service quality of all users

And standard deviation of communication service quality

Indicating unmanned vehicle c_jThe sensed state of the unmanned aerial vehicle is unmanned aerial vehicle position information;

indicating unmanned vehicle c_jAnd the sensed states of the other unmanned vehicles are other unmanned vehicle position information.

Step two: inputting environmental information acquired by the unmanned aerial vehicle and the unmanned vehicle into a pre-trained deep neural network model, and resolving to obtain communication service strategy instructions of the unmanned aerial vehicle and the unmanned vehicle;

according to the embodiment of the invention, the deep neural network structure is shown in fig. 2, and comprises a 3-layer fully-connected network, wherein the first layer and the second layer have 128 nodes, the activation function is a nonlinear rectifying unit (ReLU), the third layer has 7 output nodes, and the activation function is a SoftMax function, and the output value is limited between (0, 1).

The deep neural network pre-training process comprises the following steps: collecting interaction data of unmanned aerial vehicle and unmanned vehicle and environment, namely environment information, and estimating advantage function by using the data

And

the policy loss function L is then calculated^CLIP(θ_u)、L^CLIP(θ_v) And a loss function L of the value function^V(φ_u)、L^V(φ_v) And finally, updating the strategy network and the value function network through the loss function of the minimized strategy loss function and the loss function of the value function, thereby obtaining a well-trained deep neural network model. The specific training process is as follows:

(1) communication service strategy for initializing unmanned aerial vehicle and unmanned vehicle

And target strategy

Initializing unmanned aerial vehicles and unmanned vehicle value network

And enable policy networks for drones

With its target network

Are identical, i.e. that

At the same time, make strategy network of unmanned vehicle

With its target network

Are identical, i.e. that

(2) In each time step, namely an interaction period, the unmanned aerial vehicle and the unmanned aerial vehicle respectively collect interaction data { o ] with the environment_t(u_i),a_t(u_i),r_t+1(u_i),o_t+1(u_i) And { o }_t(v_j),a_t(v_j),r_t+1(v_j),o_t+1(v_j) In which o is_t(u_i) Representing the environmental information observed by drone i at time t,

representing the action command executed by the unmanned aerial vehicle i at time t, r_t+1(u_i) Indicating the prize value, o, received by drone i at time t +1_t+1(u_i) Environmental information, o, representing the observation of unmanned aerial vehicle i at time t +1_t(v_j) Representing environmental information observed by the unmanned vehicle j at time t,

(3) and calculating an advantage function by using the collected interaction data, wherein the advantage functions of the unmanned aerial vehicle i and the unmanned vehicle j are calculated as follows:

in the formula,

and

respectively representing the advantage functions of the unmanned aerial vehicle i and the unmanned aerial vehicle j, wherein gamma is a discount factor and is between (0 and 1);

(4) repeating the steps (2) and (3) until the set maximum step length T is reached;

(5) calculating loss values of the unmanned aerial vehicle strategy and the unmanned aerial vehicle strategy by using the interactive data collected in the steps (2), (3) and (4) and the calculated advantage function as follows:

in the formula, L^CLIP(θ_u) And L^CLIP(θ_v) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle, wherein the epsilon is a constant, and the value range is (0, 1);clip is a function, clip (r)_i ^t(θ_u) 1-e, 1+ e) represents the sum of r_i ^t(θ_u) Is limited to [ 1-e, 1+ e]To (c) to (d); r is_i ^t(θ_u) Is the ratio of the actual strategy to the target strategy of the unmanned aerial vehicle, r_i ^t(θ_v) Respectively calculating the ratio of the actual strategy to the target strategy of the unmanned vehicle as follows:

(6) minimization of L^CLIP(θ_u) And L^CLIP(θ_v) Updating communication service strategy networks of the unmanned aerial vehicle and the unmanned aerial vehicle;

(7) calculating the loss values of the unmanned plane value function and the unmanned plane value function by using the interactive data collected in the steps (2), (3) and (4) as follows:

(8) minimization of L^V(φ_u) And L^V(φ_v) Updating the unmanned aerial vehicle and the unmanned aerial vehicle value network;

(9) updating unmanned aerial vehicle target strategy network and unmanned aerial vehicle target strategy network theta'_u←θ^u；θ′_v←θ^v；

(10) And (5) repeating the steps (2) to (9) until the network training is converged, and obtaining a trained deep neural network model.

In the pre-training process, unmanned plane u_iThe prize value achieved may be represented by the following equation:

r_t(u_i)＝r_t ^Q(u_i)+r_t ^S(u_i)+r_t ^R(u_i)

in the formula, the first term r_t ^Q(u_i) In relation to the communication service quality of the user, r is when the user has a higher average communication service quality and a lower variance of the communication service quality_t ^Q(u_i) Is large; otherwise r_t ^Q(u_i) Is smaller. Second term r_t ^S(u_i) With unmanned plane u_iIn relation to the distances of other drones and other unmanned vehicles, r is the distance between drones and between drone and unmanned vehicle when the distance is small_t ^S(u_i) Is a negative value; otherwise r_t ^S(u_i) Is 0. Third term r_t ^R(u_i) With unmanned plane u_iRelative to the location of the communication service environment, when the drone u_iWhile in the communication service area, r_t ^R(u_i) Is 0; otherwise r_t ^R(u_i) Is negative.

The design process of the reward function of the unmanned vehicle is the same as that of the reward function of the unmanned vehicle. The communication service strategy of the unmanned aerial vehicle and the unmanned vehicle is trained in a deep reinforcement learning training mode, the unmanned aerial vehicle and the unmanned vehicle learn an effective cooperative communication service strategy through continuous interaction with the environment, high-quality and fair communication service can be provided for ground users, and pseudo codes of a specific implementation process are shown in the following table 1.

The unmanned aerial vehicle and the unmanned vehicle environment information acquired in real time are subjected to a trained deep neural network model, and the output value of the model comprises the probability of selecting a control instruction of each unmanned aerial vehicle

And selecting each ofProbability of man-vehicle control command

Wherein the unmanned aerial vehicle control command is a course declination command of the unmanned aerial vehicle, namely

Degree; the unmanned vehicle control instruction is a linear speed control instruction of the unmanned vehicle

And angular velocity control command

In combination, i.e.

Wherein

Finally, from

Selecting the course deflection angle with the maximum probability as an actual control instruction of the unmanned aerial vehicle, and selecting the course deflection angle with the maximum probability as the actual control instruction of the unmanned aerial vehicle

The linear speed and angular speed combination with the maximum probability is selected as the actual control command of the unmanned vehicle.

The beneficial effects of the invention are further verified through experiments.

The correctness and the rationality of the invention are verified by adopting a digital simulation mode. Firstly, a communication service environment with the size of 500m × 500m × 150m is constructed in a Python environment, and the communication service environment comprises 10 users and a dynamic communication service system consisting of a plurality of unmanned aerial vehicles and unmanned vehicles. The unmanned aerial vehicle flies at a constant speed and a constant height, the flying speed is 10m/s, the maximum speed of the unmanned aerial vehicle is 10m/s, the maximum moving speed of a user is 1m/s, and the unmanned aerial vehicle moves randomly in a communication service area. The simulation test software environment is Windows10+ Python3.7, and the hardware environment is AMD Ryzen 53550H CPU +16.0 GBRAM.

The experiment first verifies whether the communication service control strategy training of the unmanned aerial vehicle and the unmanned aerial vehicle is convergent. 10000 training rounds are performed in the experiment, the average reward value obtained by the unmanned aerial vehicle and the unmanned aerial vehicle in each 100 training rounds is recorded, and a curve is drawn as shown in fig. 3. As can be seen from fig. 3, as the training progresses, the drone and the unmanned vehicle can obtain a stable reward value, which is between 6.5 and 7, indicating that the communication service strategies of the drone and the unmanned vehicle approach convergence, and the drone and the unmanned vehicle can provide high-quality and fair communication service for the user.

And then carrying out experimental verification on the cooperation strategy of the unmanned aerial vehicle and the unmanned vehicle, wherein the verification result is shown in figure 4. As can be seen from fig. 4, the unmanned aerial vehicle and the unmanned vehicle can provide communication services for different users respectively, and the provided communication services are relatively uniform, that is, the unmanned aerial vehicle and the unmanned vehicle can cooperate to provide fair communication services for users on the ground.

The invention provides a communication service for ground users by the cooperation of an unmanned aerial vehicle and an unmanned vehicle, and can solve the problem of mutual communication between the ground users and the outside or the ground users after disasters or damages of ground communication base stations. Meanwhile, the unmanned aerial vehicle and the unmanned vehicle can solve the problem that available mobile communication equipment is not enough in cooperation, and the advantages of communication services of the unmanned aerial vehicle and the unmanned vehicle are brought into play. Compared with the traditional communication service strategy, the air-ground cooperative communication service strategy based on learning provided by the invention has the following advantages: 1) the communication service system is provided with a plurality of unmanned aerial vehicles and unmanned vehicles, and can provide high-quality and fair communication service for ground users. 2) By adding unmanned vehicles into the communication service system, the problem of insufficient quantity of available communication service unmanned aerial vehicles can be solved. 3) The cooperative communication service strategy of the unmanned aerial vehicle and the unmanned aerial vehicle is trained by using the deep reinforcement learning method, so that the cooperative communication service strategy can adapt to the change of the environment, has higher robustness and stronger environment adaptability, and can execute communication service tasks in various complex environments. The unmanned aerial vehicle and unmanned vehicle air-ground cooperative communication service strategy provided by the invention can adapt to the change of the number of unmanned aerial vehicles and unmanned vehicles and can adapt to the change of the number of ground users. The method can realize the cooperation of the unmanned aerial vehicle and the unmanned vehicle to provide high-quality and fair communication service for ground users, and provides a new technical approach for providing a post-disaster user communication service.

Another embodiment of the present invention provides an air-ground cooperative communication service system based on machine learning, including:

the data acquisition module is used for acquiring the environment information of each unmanned aerial vehicle and the unmanned vehicles in the communication service; the environment information corresponding to each unmanned aerial vehicle comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; the position information comprises a distance parameter and an angle parameter; the user state information comprises a plurality of user position information with the minimum ranking factor relative to the current unmanned aerial vehicle or unmanned aerial vehicle, the communication average service quality of all users and the standard deviation of the communication service quality; the calculation formula of the ranking factor is as follows:

indicating the communication service quality of user k at time t; d_max,Q_maxIs a normalized coefficient; lambda [ alpha ]₁,λ₂,λ₃Is a proportionality coefficient;

the instruction resolving module is used for inputting the environment information into a pre-trained deep neural network model and resolving to obtain a cooperative communication service strategy instruction of the unmanned aerial vehicle and the unmanned aerial vehicle; the method comprises a model training submodule and a probability selection submodule;

the model training submodule is used for pre-training the deep neural network model, and the pre-training process comprises the following steps:

And target strategy

Initializing unmanned aerial vehicles and unmanned vehicle value network

And enable policy networks for drones

With its target network

Are identical, i.e. that

Policy network enabling unmanned vehicles simultaneously

With its target network

Are identical, i.e. that

in the formula,

and

Twenty, repeating the second step to the second step until the network training is converged to obtain a trained deep neural network model;

the probability selection submodule is used for selecting the unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value from the trained deep neural network model output values as an unmanned aerial vehicle actual control instruction, and selecting the combination of the unmanned aerial vehicle linear speed control instruction corresponding to the maximum probability value and the unmanned aerial vehicle angular speed control instruction as an unmanned aerial vehicle actual control instruction; the output value of the deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction.

The functions of the air-ground cooperative communication service system based on machine learning according to the embodiment of the present invention can be described by the aforementioned air-ground cooperative communication service method based on machine learning, so that the detailed description of the embodiment is omitted, and reference may be made to the above method embodiments, which are not described herein again.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims

1. A method for air-ground cooperative communication service based on machine learning is characterized by comprising the following steps:

2. The air-ground cooperative communication service method based on machine learning according to claim 1, wherein the environment information corresponding to each unmanned aerial vehicle in the step one comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.

3. The air-ground cooperative communication service method based on machine learning of claim 2, wherein the user status information in step one comprises a plurality of user location information with minimum ranking factor relative to the current unmanned aerial vehicle or unmanned vehicle, communication average service quality of all users and communication service quality standard deviation; the calculation formula of the ranking factor is as follows:

4. The air-ground cooperative communication service method based on machine learning according to claim 3, wherein the process of deep neural network model pre-training in the second step comprises:

And target strategy

Initializing unmanned aerial vehicles and unmanned vehicle value network

And enable policy networks for drones

With its target network

Are identical, i.e. that

Policy network enabling unmanned vehicles simultaneously

With its target network

Are identical, i.e. that

in the formula,

and

in the formula, L^CLIP(θ_u) And L^CLIP(θ_v) Respectively representing the strategy loss value of the unmanned aerial vehicle and the strategy loss value of the unmanned aerial vehicle; e is a constant, and the value range is between (0, 1);

is the ratio of the actual strategy to the target strategy of the unmanned plane,

the ratio of the actual strategy to the target strategy of the unmanned vehicle is obtained;

5. The air-ground cooperative communication service method based on machine learning according to claim 4, wherein the specific process of obtaining cooperative communication service strategy instructions of the unmanned aerial vehicle and the unmanned vehicle by solving in the step two comprises: the output value of the trained deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction; and selecting the unmanned aerial vehicle course deflection angle instruction corresponding to the maximum probability value as an unmanned aerial vehicle actual control instruction, and selecting the combination of the unmanned vehicle linear speed control instruction and the unmanned vehicle angular speed control instruction corresponding to the maximum probability value as the unmanned vehicle actual control instruction.

6. An air-ground cooperative communication service system based on machine learning, comprising:

7. The air-ground cooperative communication service system based on machine learning of claim 6, wherein the environment information corresponding to each unmanned aerial vehicle in the data acquisition module comprises user state information in a communication service area, a plurality of unmanned aerial vehicle position information nearest to the current unmanned aerial vehicle, and a plurality of unmanned vehicle position information nearest to the current unmanned aerial vehicle; the environment information corresponding to each unmanned vehicle comprises user state information in a communication service area, a plurality of unmanned vehicle position information nearest to the current unmanned vehicle and a plurality of unmanned vehicle position information nearest to the current unmanned vehicle; wherein the position information comprises a distance parameter and an angle parameter.

8. The air-ground cooperative communication service system based on machine learning of claim 7, wherein the user status information in the data acquisition module comprises a plurality of user position information with minimum ranking factor relative to the current unmanned aerial vehicle or unmanned aerial vehicle, communication average service quality of all users and communication service quality standard deviation; the calculation formula of the ranking factor is as follows:

9. The air-ground cooperative communication service system based on machine learning of claim 8, wherein the instruction resolving module comprises a model training submodule for pre-training a deep neural network model, and the pre-training process comprises:

And target strategy

Initializing unmanned aerial vehicles and unmanned vehicle value network

And enable policy networks for drones

With its target network

Are identical, i.e. that

Policy network enabling unmanned vehicles simultaneously

With its target network

Are identical, i.e. that

in the formula,

and

10. The air-ground cooperative communication service system based on machine learning of claim 9, wherein the instruction resolving module further comprises a probability selection sub-module, the probability selection sub-module is configured to select an unmanned aerial vehicle heading deflection angle instruction corresponding to a maximum probability value from the trained deep neural network model output values as an unmanned aerial vehicle actual control instruction, and select a combination of an unmanned aerial vehicle linear velocity control instruction corresponding to the maximum probability value and an unmanned aerial vehicle angular velocity control instruction as the unmanned aerial vehicle actual control instruction; the output value of the deep neural network model comprises the probability of selecting each unmanned aerial vehicle control instruction and the probability of selecting each unmanned vehicle control instruction, the unmanned aerial vehicle control instruction is an unmanned aerial vehicle course deflection angle instruction, and the unmanned vehicle control instruction is the combination of an unmanned vehicle linear speed control instruction and an unmanned vehicle angular speed control instruction.