CN116520887A

CN116520887A - Adaptive adjustment method for cluster structure of hybrid multiple unmanned aerial vehicles

Info

Publication number: CN116520887A
Application number: CN202310640306.2A
Authority: CN
Inventors: 蒋嶷川; 钱睿溢; 姜元爽
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-08-01

Abstract

The invention provides a self-adaptive adjustment method for a cluster structure of a hybrid multi-unmanned aerial vehicle, which is characterized in that by designing a special type of state encoder, the attribute and the capability of different dimensions of heterogeneous unmanned aerial vehicles are converged to the same dimension, the dimension of a knowledge space is reduced, then multidimensional heterogeneous data which is processed by fusion of a cluster clustering algorithm based on reinforcement learning is used, a global adjustment strategy is learned, and the cluster structure and a reconstruction network are adjusted. The technical core of the state encoder is that a set of neural network is independently trained for each type of unmanned aerial vehicle, and special treatment is carried out on the attribute and the capability of different dimensions of each type of unmanned aerial vehicle, so that the difference among heterogeneous unmanned aerial vehicles is reflected better. According to the method, the characteristics of different dimensions of each type of unmanned aerial vehicle can be converged under the mixed situation, the problem of multi-dimensional heterogeneous data fusion is effectively solved, and the stability and the efficiency of the mixed multi-unmanned aerial vehicle cluster are improved.

Description

A hybrid multi-UAV swarm structure adaptive adjustment method

技术领域Technical Field

本发明涉及无人机领域，具体涉及一种混杂多无人机集群结构自适应调整方法。The present invention relates to the field of unmanned aerial vehicles (UAVs), and in particular to a method for adaptively adjusting a structure of a mixed multi-UAV cluster.

背景技术Background Art

混杂多无人机集群情境下的多维异构性使得异构无人机在运动状态和通信能力上存在很大的差异，这些差异使得集群内部的拓扑结构更有可能动态变化，导致无人机之间相对位置和通信成本发生改变，引起了集群结构的不稳定。集群结构不稳定是指由组成集群的无人机变动或通信连接断开等因素引起的集群结构频繁变化。当多无人机集群系统中出现集群结构不稳定的状态时，出于对整体系统运行效率的优化的考虑，需要对系统进行必要的集群结构调整保证集群结构的稳定。而面对混杂多无人机集群情境时，集群结构调整方法需要解决多维异构性数据融合问题，而已有的多无人机集群结构调整模型大多数没有考虑(例如在计算异构无人机的运动状态相似度时，只考虑了无人机的当前速度和方向，忽略了可变速无人机的加速度等其他维度的属性带来的影响)，状态提取不准确，容易陷入局部最优解。若直接考虑异构无人机所有不同维度的属性，扩展所有无人机属性的维度至包含所有不同类型的属性(例如为固定速度的无人机添加值为0的加速度属性和最大速度属性)，则会出现问题解空间维度大幅增加的问题。可见，混杂多无人机集群情境下的多维异构性数据融合问题对整个系统的运行效率优化有着极其重要的影响。The multi-dimensional heterogeneity in the mixed multi-UAV cluster scenario makes heterogeneous UAVs have great differences in motion state and communication capabilities. These differences make the topological structure inside the cluster more likely to change dynamically, resulting in changes in the relative positions and communication costs between UAVs, causing instability in the cluster structure. Cluster structure instability refers to the frequent changes in the cluster structure caused by factors such as changes in the UAVs that make up the cluster or disconnection of communication connections. When the cluster structure is unstable in the multi-UAV cluster system, in order to optimize the overall system operation efficiency, it is necessary to make necessary cluster structure adjustments to the system to ensure the stability of the cluster structure. When facing the mixed multi-UAV cluster scenario, the cluster structure adjustment method needs to solve the problem of multi-dimensional heterogeneous data fusion, which most of the existing multi-UAV cluster structure adjustment models do not consider (for example, when calculating the similarity of the motion state of heterogeneous UAVs, only the current speed and direction of the UAV are considered, ignoring the influence of other dimensional attributes such as the acceleration of the variable-speed UAV), the state extraction is inaccurate, and it is easy to fall into the local optimal solution. If we directly consider the attributes of all different dimensions of heterogeneous drones and expand the dimensions of all drone attributes to include all different types of attributes (for example, adding acceleration attributes and maximum speed attributes with a value of 0 for fixed-speed drones), the dimension of the problem solution space will increase significantly. It can be seen that the multi-dimensional heterogeneous data fusion problem in the context of mixed multi-drone clusters has an extremely important impact on the optimization of the operating efficiency of the entire system.

实现混杂多无人机集群情境下多维异构性数据的融合是为了提高集群结构调整模型的灵活性，降低解空间的维度，得到更好的优化效果，使整个系统的运行效率最大化。为了实现该优化目标，可以为系统中每类异构无人机单独设计一种状态编码器汇聚其维度不同的属性。每种状态编码器都使用经过特殊训练的神经网络，结合异构无人机的历史状态，异构无人机不同维度的属性和能力等状态经过对应类型的状态编码器处理，被汇聚成维度相同的无人机观测特征(例如某种无人机存在两个维度的剩余能量属性，为剩余飞行能量和剩余通信能量，该类无人机的状态编码器将参照其历史状态中两种能量的消耗速度对它们进行汇聚，即按照合适的权重将两个维度的能量属性加权求和，得到无人机观测特征中的能量属性)。The fusion of multi-dimensional heterogeneous data in the context of mixed multi-UAV clusters is to improve the flexibility of the cluster structure adjustment model, reduce the dimension of the solution space, obtain better optimization effects, and maximize the operating efficiency of the entire system. In order to achieve this optimization goal, a state encoder can be designed for each type of heterogeneous UAV in the system to aggregate its attributes of different dimensions. Each state encoder uses a specially trained neural network, combined with the historical state of the heterogeneous UAV. The attributes and capabilities of the heterogeneous UAVs in different dimensions are processed by the corresponding type of state encoder and aggregated into the observation features of the UAVs with the same dimension (for example, a certain type of UAV has two dimensions of residual energy attributes, namely, residual flight energy and residual communication energy. The state encoder of this type of UAV will aggregate them according to the consumption rate of the two energies in its historical state, that is, weighted sum the energy attributes of the two dimensions according to the appropriate weights to obtain the energy attributes in the UAV observation features).

为了融合处理后的多维异构性特征与网络结构，本方法中引入了基于强化学习的集群聚类算法，该算法以系统的网络结构为基础构建图注意力网络，使用自注意力机制融合所有异构无人机的特征，通过Actor-Critic架构的模型学习不同的集群结构调整策略的好坏，最终给出当前情境下最优的集群结构调整策略。In order to integrate the processed multi-dimensional heterogeneous features with the network structure, this method introduces a cluster clustering algorithm based on reinforcement learning. The algorithm constructs a graph attention network based on the network structure of the system, uses the self-attention mechanism to integrate the features of all heterogeneous drones, and learns the pros and cons of different cluster structure adjustment strategies through the Actor-Critic architecture model, and finally gives the optimal cluster structure adjustment strategy in the current situation.

发明内容Summary of the invention

技术问题：Technical issues:

本发明的目的是提出一种混杂多无人机集群下的集群结构调整方法，主要解决混杂情境下由多维异构性数据难以融合的问题，该问题会导致算法解空间过大，集群结构频繁变动等情况。本发明的核心技术在于，利用异构无人机的属性和能力的多维度信息，将其汇聚到单一的状态属性上，降低解空间的维度，使用基于强化学习的集群聚类算法对异构无人机特征与集群结构等数据进行融合，学习全局的集群结构调整策略，根据策略调整集群结构并重构通信网络，从而实现全局的集群结构调整，进一步提高了集群结构的稳定性和系统的效率。The purpose of the present invention is to propose a cluster structure adjustment method under a mixed multi-UAV cluster, which mainly solves the problem that multi-dimensional heterogeneous data is difficult to fuse under mixed scenarios, which will lead to too large algorithm solution space and frequent changes in cluster structure. The core technology of the present invention is to use the multi-dimensional information of the attributes and capabilities of heterogeneous UAVs, aggregate them into a single state attribute, reduce the dimension of the solution space, use a cluster clustering algorithm based on reinforcement learning to fuse heterogeneous UAV features and cluster structure data, learn the global cluster structure adjustment strategy, adjust the cluster structure according to the strategy and reconstruct the communication network, so as to achieve global cluster structure adjustment, and further improve the stability of the cluster structure and the efficiency of the system.

技术方案：Technical solution:

在混杂多无人机集群系统中，每个无人机承担了不同数量的任务，同时，同一集群中的无人机往往有着密切的联系。为了低成本地建立集群内稳定的通信网络，我们需要在每个集群内部选出该集群的通信无人机Cluster Head(CH)，集群的成员无人机ClusterMembers(CMs)都要和CH建立通信连接。In a mixed multi-UAV swarm system, each UAV undertakes a different number of tasks, and the UAVs in the same swarm are often closely connected. In order to establish a stable communication network within the swarm at a low cost, we need to select the communication UAV Cluster Head (CH) in each swarm, and the member UAVs ClusterMembers (CMs) of the swarm must establish a communication connection with the CH.

初始阶段给定情境为无人机分布均匀的混杂多无人机集群系统，设计了U种具有多维异构性的异构无人机，各集群中无人机的类型均匀分配。针对每个集群内的异构无人机特征的多维异构性，本方案使用针对异构无人机的特定类型状态编码器分别处理不同类型无人机的状态特征，为每种类型的无人机单独训练一套神经网络以处理其维度不同的属性与能力状态，最终通过神经网络中的全连接层汇聚其不同维度的属性和能力特征，得到维度相同的无人机观测。In the initial stage, the given scenario is a mixed multi-UAV cluster system with uniform distribution of UAVs. U heterogeneous UAVs with multi-dimensional heterogeneity are designed, and the types of UAVs in each cluster are evenly distributed. In view of the multi-dimensional heterogeneity of the characteristics of heterogeneous UAVs in each cluster, this scheme uses a specific type of state encoder for heterogeneous UAVs to process the state characteristics of different types of UAVs respectively, and trains a set of neural networks for each type of UAV to process its attributes and capability states with different dimensions. Finally, the attributes and capability characteristics of different dimensions are aggregated through the fully connected layer in the neural network to obtain the observations of UAVs with the same dimensions.

每一类异构无人机都有一套专用的神经网络，由该类型无人机的历史状态训练得到，用于结合无人机过去一段时间的特征并处理无人机当前维度不同的特征，以产生维度相同的观测特征，这些观测特征通过基于Actor-Critic架构的强化学习方法学习全局的集群结构调整策略，根据学习出的策略调整集群结构并重构通信网络。Each type of heterogeneous UAV has a dedicated neural network, which is trained based on the historical state of that type of UAV. It is used to combine the characteristics of the UAV over the past period of time and process the characteristics of the UAV with different current dimensions to generate observation features of the same dimension. These observation features are used to learn the global cluster structure adjustment strategy through the reinforcement learning method based on the Actor-Critic architecture. The cluster structure is adjusted according to the learned strategy and the communication network is reconstructed.

具体来说，每类无人机的神经网络使用重置门和更新门结合历史状态和当前特征来更新其当前特征，同时使用全连接层将不同维度的属性和能力汇聚到同一维度的观测特征。这些观测特征被送入图注意力网络，通过自注意力机制处理系统中所有无人机的观测，得到代表整个系统的观测特征。Actor网络将根据系统观测特征给出一个策略π_θ，表示当前系统状态下采取各全局调整策略的概率。Specifically, the neural network of each type of drone uses reset gates and update gates to combine historical states and current features to update its current features, and uses fully connected layers to aggregate attributes and capabilities of different dimensions into observation features of the same dimension. These observation features are sent to the graph attention network, which processes the observations of all drones in the system through the self-attention mechanism to obtain observation features representing the entire system. The Actor network will give a strategy π _θ based on the system observation features, which represents the probability of taking each global adjustment strategy under the current system state.

根据Actor网络输出的π_θ选取全局调整策略，由调整策略中的效用权重对异构无人机各状态因子加权求和得到无人机的效用，按照效用和通信范围进行集群聚类，分集群内和跨集群两层为重新聚类的混杂多无人机集群重构网络结构，完成一次集群结构调整。同时，Critic价值网络评估这次集群结构调整的优化效果，并计算当前状态和下一状态的值函数。通过策略梯度函数更新Critic状态值函数和Actor策略参数θ，训练模型。According to the π _θ output by the Actor network, a global adjustment strategy is selected. The utility of the UAV is obtained by weighted summing of the state factors of the heterogeneous UAVs according to the utility weights in the adjustment strategy. Clustering is performed according to the utility and communication range. The network structure of the re-clustered mixed multi-UAV cluster is reconstructed in two layers, within the cluster and across the cluster, to complete a cluster structure adjustment. At the same time, the Critic value network evaluates the optimization effect of this cluster structure adjustment and calculates the value function of the current state and the next state. The Critic state value function and the Actor strategy parameter θ are updated through the policy gradient function to train the model.

本方法解决了混杂多无人机集群情境下的多维异构性数据融合问题，将具有多维异构性的无人机状态汇聚到同一维度的无人机观测特征，降低了解空间的维度。在此基础上使用强化学习方法学习最优的集群结构调整策略，达到最好的优化效果。This method solves the problem of multi-dimensional heterogeneous data fusion in a mixed multi-UAV cluster scenario, aggregates the UAV states with multi-dimensional heterogeneity into the UAV observation features of the same dimension, and reduces the dimension of the understanding space. On this basis, the reinforcement learning method is used to learn the optimal cluster structure adjustment strategy to achieve the best optimization effect.

有益效果：Beneficial effects:

(1)增强对混杂多无人机集群应用场景的适应能力通过解决混杂多无人机集群下多维异构性数据的融合问题，充分考虑了混杂场景下异构无人机不同维度的属性与能力带来的影响。可以解决以往没有考虑多维异构性的集群结构调整方法适应能力差，容易陷入局部最优解的问题。(1) Enhance the adaptability to mixed multi-UAV cluster application scenarios. By solving the fusion problem of multi-dimensional heterogeneous data in mixed multi-UAV clusters, the impact of the attributes and capabilities of heterogeneous UAVs in different dimensions in mixed scenarios is fully considered. This can solve the problem that the previous cluster structure adjustment methods that did not consider multi-dimensional heterogeneity have poor adaptability and are prone to falling into local optimal solutions.

(2)降低了解空间的维度使用针对异构无人机类型的状态编码器为每类无人机做特殊处理，汇聚其不同维度的属性和能力状态至系统中所有无人机统一的维度，这大幅降低了解空间的维度。(2) Reduce the dimensionality of the understanding space. Use state encoders for heterogeneous drone types to perform special processing for each type of drone, and aggregate their attributes and capability states of different dimensions into a unified dimension for all drones in the system, which greatly reduces the dimensionality of the understanding space.

(3)提高多无人机集群结构的稳定性针对混杂情境的自适应集群结构调整方法，可以融合异构无人机不同维度的属性和能力与各集群结构等多维异构性数据，给出最优的集群结构调整策略，实现集群结构的动态稳定，提高了整体系统的运行效率。(3) Improving the stability of the multi-UAV cluster structure. The adaptive cluster structure adjustment method for mixed scenarios can integrate the attributes and capabilities of heterogeneous UAVs in different dimensions with multi-dimensional heterogeneous data such as cluster structures, give the optimal cluster structure adjustment strategy, achieve dynamic stability of the cluster structure, and improve the operating efficiency of the overall system.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是异构多无人机集群系统的结构示意图。Figure 1 is a schematic diagram of the structure of a heterogeneous multi-UAV swarm system.

图2是部分集群结构的示例图。FIG2 is an example diagram of a partial cluster structure.

图3是本发明方法的主要原理图。FIG. 3 is a diagram showing the main principles of the method of the present invention.

图4是本发明方法的网络结构图。FIG. 4 is a network structure diagram of the method of the present invention.

在图1、图2中的不同形状代表了不同的无人机类型。图2中，集群中选出的CH通信无人机负责与集群中所有其他的CMs成员无人机通信。同构无人机之间的通信为实线连接，异构无人机之间的通信为虚线连接。The different shapes in Figures 1 and 2 represent different types of drones. In Figure 2, the selected CH communication drone in the cluster is responsible for communicating with all other CMs member drones in the cluster. The communication between homogeneous drones is connected by solid lines, and the communication between heterogeneous drones is connected by dotted lines.

具体实施方式DETAILED DESCRIPTION

下面结合附图和具体实施方式，进一步阐明本发明，应理解下述具体实施方式仅用于说明本发明而不用于限制本发明的范围。The present invention will be further explained below in conjunction with the accompanying drawings and specific embodiments. It should be understood that the following specific embodiments are only used to illustrate the present invention and are not used to limit the scope of the present invention.

如图所示，本发明提供了一种混杂多无人机集群结构自适应调整方法。As shown in the figure, the present invention provides a method for adaptively adjusting a hybrid multi-UAV cluster structure.

1.初始阶段1. Initial stage

初始阶段对异构无人机、集群、混杂多无人机集群，进行感知：异构无人机感知过程分析每个无人机的类型、属性和能力、状态等特征；集群感知过程需要感知当前多无人机集群系统内集群的数量和每个集群中异构无人机的组成；混杂网络结构感知过程需要感知集群内和跨集群的，无人机之间通信连接组成的网络结构，同时也要计算异构无人机之间不断变化的通信成本。在初始阶段，我们依次对这三个感知对象及感知过程进行描述：In the initial stage, heterogeneous drones, clusters, and mixed multi-drone clusters are perceived: the heterogeneous drone perception process analyzes the type, attributes, capabilities, status, and other characteristics of each drone; the cluster perception process needs to perceive the number of clusters in the current multi-drone cluster system and the composition of heterogeneous drones in each cluster; the mixed network structure perception process needs to perceive the network structure composed of communication connections between drones within and across clusters, and also calculate the changing communication costs between heterogeneous drones. In the initial stage, we describe these three perception objects and perception processes in turn:

1.1异构无人机感知过程：对每个无人机进行具体的分析。第k个集群C_k中的任意无人机1.1 Heterogeneous UAV perception process: Specific analysis of each UAV. Any UAV in the kth cluster C _k

可以由一个三元组表示：其中代表系统中第k个集群内异构无人机的无人机类型；代表异构无人机的属性和能力特征(多维异构性的体现，异构无人机之间特征向量P的维度不同)，如剩余能量、速度等；表示异构无人机的状态特征(状态特征向量v的维度都相同，与无人机类型无关)，如无人机编号、当前位置坐标、方向、角色等。 It can be represented by a triple: in Represents the heterogeneous drones in the kth cluster in the system Type of drone; Representative heterogeneous drones Attributes and capability characteristics (the embodiment of multi-dimensional heterogeneity, the dimensions of the feature vector P between heterogeneous drones are different), such as remaining energy, speed, etc.; Representing heterogeneous drones state features (the dimensions of the state feature vector v are the same and have nothing to do with the type of drone), such as drone number, current position coordinates, direction, role, etc.

1.2集群感知过程：在初始阶段，首先需要对无人机集群的组成进行具体的分析。多无人机集群系统由多个异构无人机集群C_k构成，每个无人机集群C_k由异构的无人机组成，1.2 Cluster perception process: In the initial stage, we first need to analyze the composition of the drone cluster. The multi-drone cluster system is composed of multiple heterogeneous drone clusters C _k , each drone cluster C _k consists of heterogeneous drones. composition,

其中N_k为该集群中无人机的数量。 Where _Nk is the number of drones in the cluster.

1.3混杂网络结构感知过程：本专利针对的混杂多无人机集群系统中，异构无人机之间存在两种连接关系，一种是集群内部的通信连接，另一种是跨集群的通信连接。异构无人机之间的通信连接构成了系统内混杂的网络结构。两种连接都是动态的，会随着集群结构调整变化。综合异构多无人机系统中的集群内连接与跨集群连接，拥有N个集群的混杂多无人机集群系统(MAS)最终被描述为MAS＝(n₁,n₂,…,n_N,E)，其中n_k＝<C_k,CH_k>1.3 Hybrid network structure perception process: In the hybrid multi-UAV cluster system targeted by this patent, there are two types of connection relationships between heterogeneous UAVs, one is the communication connection within the cluster, and the other is the communication connection across clusters. The communication connection between heterogeneous UAVs constitutes a hybrid network structure within the system. Both connections are dynamic and will change with the adjustment of the cluster structure. Combining the intra-cluster connection and cross-cluster connection in the heterogeneous multi-UAV system, the hybrid multi-UAV cluster system (MAS) with N clusters is finally described as MAS = (n ₁ ,n ₂ ,…,n _N ,E), where n _k = <C _k ,CH _k >

表示第k个集群的完整感知，C_k是第k个集群的异构无人机集合，CH_k表示第k个集群当前的通信无人机,E是代表混杂多无人机集群系统内网络结构的邻接矩阵。使用二元组represents the complete perception of the k-th cluster, C _k is the heterogeneous drone set of the k-th cluster, CH _k represents the current communicating drone of the k-th cluster, and E is the adjacency matrix representing the network structure within the heterogeneous multi-drone cluster system. Using binary

代表系统中的通信连接。所有通信连接在构建时，以及后续每一时刻维持连接花费的通信成本表示为代表集群C_k中的异构无人机与集群C_l中的异构无人机的通信成本，其中l有可能等于k。函数c将根据无人机与的类型以及它们之间的距离计算出通信成本。 Represents the communication connection in the system. The communication cost of all communication connections when they are established and maintained at each subsequent moment is expressed as Represents the heterogeneous drones in the cluster C _k With heterogeneous drones in cluster C _l The communication cost is , where l may be equal to k. Function c will be based on the drone and The communication cost is calculated based on the types of nodes and the distance between them.

2.模型构建与问题制定2. Model construction and problem formulation

混杂多无人机集群系统中为了通过聚类算法与路由算法调整集群结构，需要评估所有无人机的状态，这使得我们需要使用感知阶段得到的原始输入为每个无人机计算它的效用，通过计算出的效用值体现无人机的状态。本模型为无人机效用设计了六个因子：剩余能量百分比、度、和邻居的平均距离、和邻居的平均运动状态相似度、通信连接的平均保持时间、平均通信成本，无人机效用由这六个因子加权求和得到。In order to adjust the cluster structure through clustering and routing algorithms in a mixed multi-UAV cluster system, it is necessary to evaluate the status of all UAVs, which requires us to use the original input obtained in the perception phase to calculate the utility of each UAV and reflect the status of the UAV through the calculated utility value. This model designs six factors for the utility of UAVs: remaining energy percentage, degree, average distance to neighbors, average motion state similarity to neighbors, average maintenance time of communication connection, and average communication cost. The utility of UAVs is obtained by weighted summation of these six factors.

各因子的权重会影响效用的计算，进而影响无人机状态评估的准确性。随着异构无人机的移动，集群结构不断改变，能准确反应当前情景下无人机状态的最优效用计算权重也会随之改变。混杂多无人机集群情境下的集群结构调整方法需要考虑多维异构性，融合异构无人机不同维度的特征和集群结构等多维异构性数据，给出当前情境下最优的集群结构调整策略，即最优的效用计算权重。The weight of each factor will affect the calculation of utility, and thus affect the accuracy of drone status assessment. As heterogeneous drones move, the cluster structure changes continuously, and the optimal utility calculation weight that can accurately reflect the drone status in the current scenario will also change accordingly. The cluster structure adjustment method in the mixed multi-drone cluster scenario needs to consider multi-dimensional heterogeneity, integrate the characteristics of heterogeneous drones in different dimensions and multi-dimensional heterogeneous data such as cluster structure, and give the optimal cluster structure adjustment strategy in the current scenario, that is, the optimal utility calculation weight.

本问题的优化目标是给出的集群结构调整策略能达成集群结构稳定性与系统通信成本的均衡，实现最优的系统运行效率。通过确定集群结构调整策略，我们打算在很长一段时间内实现系统运行效率最大化。整个系统作为一个智能体，根据集群结构调整策略与系统所处的环境对接。下面将给出本问题的马尔可夫决策模型：The optimization goal of this problem is to give a cluster structure adjustment strategy that can achieve a balance between cluster structure stability and system communication costs, and achieve optimal system operation efficiency. By determining the cluster structure adjustment strategy, we intend to maximize the system operation efficiency over a long period of time. The entire system, as an intelligent entity, connects with the environment in which the system is located according to the cluster structure adjustment strategy. The Markov decision model for this problem is given below:

·状态：S(t)，表示整个系统在时间间隔t的集群结构状态观测,包含每个集群中异构无人机的状态、分布情况、通信网络结构和CH通信无人机。首先，在每个时间间隔t开始时，定义第k个集群中异构无人机的组成和状态S_k(t)为：State: S(t), represents the cluster structure state observation of the entire system at time interval t, including the state, distribution, communication network structure and CH communication drones of the heterogeneous drones in each cluster. First, at the beginning of each time interval t, the composition and state of the heterogeneous _drones in the kth cluster are defined as:

其中S_k(t)是第k个集群中异构无人机在时间间隔t的观测的集合，N_k(t)为第k个集群在t时刻集群内异构无人机的数量。定义为：where S _k (t) is the number of heterogeneous drones in the kth cluster Observations at time interval t N _k (t) is the number of heterogeneous drones in the kth cluster at time t. Definition for:

在时间间隔t，为无人机的类型；为异构无人机的属性和能力特征，包含剩余能量、速度等；为异构无人机的状态特征，包含无人机编号、位置坐标、方向、角色等。At time interval t, For drones Type; The attributes and capability characteristics of heterogeneous drones, including remaining energy, speed, etc. It is the status characteristics of heterogeneous drones, including drone number, location coordinates, direction, role, etc.

随后定义t时刻系统的网络结构E(t)，为一个邻接矩阵。Then the network structure E(t) of the system at time t is defined as an adjacency matrix.

最终整个系统的状态S(t)可被表示为：Finally, the state S(t) of the entire system can be expressed as:

S(t)＝(S₁(t),S₂(t),…,S_k(t),…,S_N(t)(t),E(t))S(t)＝(S ₁ (t),S ₂ (t),…,S _k (t),…,S _N(t) (t),E(t))

其中N(t)为t时刻系统中异构无人机集群的数量。Where N(t) is the number of heterogeneous drone clusters in the system at time t.

·动作：定义系统的集群结构调整策略。设为系统在时间间隔t要采取的策略，定义为：·action: Define the system's cluster structure adjustment strategy. is the strategy to be adopted by the system in time interval t, defined as:

其中λ_i(i∈[1,6])代表无人机第i个因子的效用权重，λ_i满足以下的约束条件。Where λ _i (i∈[1,6]) represents the utility weight of the i-th factor of the drone, and λ _i satisfies the following constraints.

λ₁+λ₂+λ₃+λ₄+λ₅+λ₆＝1λ ₁ +λ ₂ +λ ₃ +λ ₄ +λ ₅ +λ ₆ =1

·奖励函数：r(t)，系统在t时刻，状态S(t)时采取策略进行集群结构调整所获Reward function: r(t), the strategy adopted by the system at time t and state S(t) Cluster structure adjustment

得的全局奖励可被表示为：The global reward can be expressed as:

λ为权重，r_s为集群结构稳定性奖励，r_c为能量消耗带来的负奖励。λ is the weight, _rs is the cluster structure stability reward, and _rc is the negative reward caused by energy consumption.

r_s＝α₁r_ATT+α₂r_CTD+α₃r_SMD+α₄r_CTC r _s =α ₁ r _ATT +α ₂ r _CTD +α ₃ r _SMD +α ₄ r _CTC

r_c＝-(β₁r_build+β₂r_maintance+β₃r_isolated+β₄r_fly)r _c =-(β ₁ r _build +β ₂ r _maintenance +β ₃ r _isolated +β ₄ r _fly )

其中，α、β为权重，r_ATT表示无人机保持在当前集群内的平均时间；r_CTD表示各集群内无人机之间网络结构的紧密程度；r_SMD表示各集群内无人机之间运动状态相似度；r_CTC表示各集群内无人机之间通信连接保持的平均时间的；r_build表示由集群结构调整引起的新通信连接构建的通信成本；r_maintance表示上一时刻系统中的通信网络维持到这一时刻花费的维护成本；r_isolated表示所有集群中的孤立集群引起的额外通信损耗；r_fly表示系统中所有无人机在当前时刻的飞行能耗。Among them, α and β are weights, r _ATT represents the average time that the UAVs stay in the current cluster; r _CTD represents the tightness of the network structure between the UAVs in each cluster; r _SMD represents the similarity of the motion states between the UAVs in each cluster; r _CTC represents the average time that the communication connection between the UAVs in each cluster is maintained; r _build represents the communication cost of building a new communication connection caused by the cluster structure adjustment; r _maintenance represents the maintenance cost of the communication network in the system from the previous moment to this moment; r _isolated represents the additional communication loss caused by the isolated clusters in all clusters; r _fly represents the flight energy consumption of all UAVs in the system at the current moment.

其中N为集群数量，N_i为集群C_i内无人机的数量，mt_i,j为集群C_i中第j台无人机保持在当前集群的时间。Where N is the number of clusters, _Ni is the number of drones in cluster _Ci , and mt _i,j is the jth drone in cluster _Ci The time to stay in the current cluster.

其中为集群C_i中的无人机与集群中的通信无人机CH_i之间的距离。in is the UAV in cluster _Ci The distance between the communication drone CH _i in the cluster.

其中为集群C_i中的无人机与集群中的通信无人机CH_i之间运动状态的相似度，speed为无人机的当前速度，θ为无人机的方向。in is the UAV in cluster _Ci The similarity of the motion state between the communicating drone CH _i in the cluster, speed is the current speed of the drone, and θ is the direction of the drone.

其中为集群C_i中的无人机与集群中的通信无人机CH_i之间通信连接保持的时间。in is the UAV in cluster _Ci The time the communication connection with the communicating drone CH _i in the cluster is maintained.

其中为1.4中计算的通信连接构建的通信成本，I^t为t时刻新构建的通信连接，公式为：in is the communication cost of the communication connection calculated in 1.4, I ^t is the newly constructed communication connection at time t, and the formula is:

I^t＝E(t)-(E(t)∩E(t-1))I ^t = E(t)-(E(t)∩E(t-1))

计算了混杂多无人机集群在t时刻新构建的通信连接所需的通信成本。The communication cost required for the newly constructed communication connection of the hybrid multi-UAV cluster at time t is calculated.

计算了上一时刻混杂多无人机集群内所有通信连接的通信维护成本。The communication maintenance cost of all communication connections within the mixed multi-UAV cluster at the last moment is calculated.

r_isolated＝num_isolated r _isolated = num _isolated

其中num_isolated为系统中不与其他任何集群存在跨集群通信连接的孤立集群的数量，即在E中不存在通信连接的集群的数量。孤立集群不能与其他集群直接通信，需要通过系统基站才能取得其他集群的动作和状态，每个孤立集群都会引发额外的通信损耗。Where num _isolated is the number of isolated clusters in the system that have no cross-cluster communication connection with any other clusters, that is, the number of clusters that have no communication connection in E. Isolated clusters cannot communicate directly with other clusters and need to obtain the actions and status of other clusters through the system base station. Each isolated cluster will cause additional communication loss.

其中为集群C_i中第j个无人机当前时刻的飞行能耗，受无人机的类型和速度影响。in is the flight energy consumption of the jth UAV in cluster _Ci at the current moment, which is affected by the type and speed of the UAV.

在此基础上，根据决策模型的时间步t＝1,2,…,T,以及系统的策略表示在状态S(t)下，系统采取动作的概率。On this basis, according to the time step t＝1,2,…,T of the decision model and the strategy of the system Indicates that in state S(t), the system takes action probability.

根据马尔可夫决策的定义，每个集群的策略应该满足最优化条件，即系统的策略应最大化其奖励函数r(t)，公式为：According to the definition of Markov decision, the strategy of each cluster should meet the optimization condition, that is, the strategy of the system should maximize its reward function r(t), which is:

其中为系统可以采取的动作空间，是所有可能采取的效用权重的集合。in is the action space that the system can take, which is the set of all possible utility weights.

3.基于强化学习的集群结构调整算法3. Cluster structure adjustment algorithm based on reinforcement learning

本专利使用强化学习算法解决上文指定的问题，首先需要为马尔可夫决策模型构建神经网络，随后通过Actor-Critic框架进行训练，更新价值网络参数和策略网络参数。为了使神经网络能融合混杂多无人机集群中动态变化的多维异构性数据，本专利将算法中神经网络的构建分为两个层次：针对异构无人机类型的特殊状态编码器、混杂多无人机集群结构。This patent uses a reinforcement learning algorithm to solve the problems specified above. First, a neural network needs to be built for the Markov decision model, and then trained through the Actor-Critic framework to update the value network parameters and strategy network parameters. In order to enable the neural network to integrate the dynamically changing multi-dimensional heterogeneous data in the mixed multi-drone cluster, this patent divides the construction of the neural network in the algorithm into two levels: a special state encoder for heterogeneous drone types and a mixed multi-drone cluster structure.

3.1针对异构无人机类型的特殊状态编码器3.1 Special state encoders for heterogeneous UAV types

首先我们需要对异构无人机维度不同的属性和能力特征进行处理,系统中有n种类型的异构无人机，为每种无人机单独训练一个网络以处理其特征。对于时刻t的第k个集群中第i个无人机根据原始输入的系统状态S(t)＝(S₁(t),S₂(t),…,S_k(t),…,S_N(t)(t),E(t))中S_k(t)内包含的各异构无人机的观测提取其中的其中代表异构无人机的类型编号。First, we need to process the different attributes and capabilities of heterogeneous drones. There are n types of heterogeneous drones in the system, and a network is trained separately for each type of drone to process its features. For the i-th drone in the k-th cluster at time t According to the original input system state S(t) = ( _S1 (t), _S2 (t), ..., _Sk (t), ..., _SN(t) (t), E(t)), the observations of the heterogeneous UAVs contained in _Sk (t) are Extract in Represents the type number of the heterogeneous drone.

本专利使用GRU设计这种针对异构无人机类型的特殊状态编码器。编码器的输入为异构无人机的类型异构无人机不同维度的属性和能力特征输出为一个统一维度的特征向量以下介绍的具体计算过程。This patent uses GRU to design this special state encoder for heterogeneous drone types. The input of the encoder is the type of heterogeneous drone Attributes and capability characteristics of heterogeneous drones in different dimensions The output is a feature vector of uniform dimension The following introduction The specific calculation process.

记输入的无人机的类型编号为type，GRU中存储的无人机的历史(隐藏)状态是一个向量，用于表示无人机在上一时刻的历史特征。在处理时，我们可以将加入到输入数据中。我们使用针对type类型无人机的两个门控：重置门r^type和更新门u^type，重置门用于控制是否忽略GRU的隐藏状态它的计算公式如下：Remember to input the drone Type number For type, the drone stored in GRU History (hidden) status is a vector representing the drone Historical features at the last moment. When we can Add to the input data. We use two gates for type type drones: reset gate r ^type and update gate u ^type . The reset gate is used to control whether to ignore the hidden state of GRU Its calculation formula is as follows:

其中type是无人机的无人机类型编号，和是处理type类型无人机状态的GRU网络中输入特征和隐藏状态对r^type的权重矩阵，是偏置向量，σ是sigmoid函数。Type is drone The drone type number, and is the weight matrix of the input features and hidden states for r ^type in the GRU network that processes the type type drone state. is the bias vector and σ is the sigmoid function.

更新门u^type用于控制GRU更新隐藏状态的程度。它的计算公式如下：The update gate u ^type is used to control the GRU to update the hidden state The calculation formula is as follows:

和是处理type类型无人机状态的GRU网络中输入特征和隐藏状态对u^type的权重矩阵,是偏置向量，σ是sigmoid函数。 and is the weight matrix of input features and hidden states for u ^type in the GRU network that processes the type type drone state. is the bias vector and σ is the sigmoid function.

根据r^type和u^type计算当前时刻的候选特征 Calculate the candidate features at the current moment based on r ^type and u ^type

其中表示重置门向量和隐藏状态向量的每个元素相乘，和是处理type类型无人机状态的GRU网络中输入和隐藏状态对的权重矩阵，是偏置向量，tanh表示双曲正切函数，将输入值映射到区间[-1,1]内。in Represents the multiplication of each element of the reset gate vector and the hidden state vector, and is the input and hidden state pair in the GRU network that processes the type type drone state The weight matrix, is the bias vector, tanh represents the hyperbolic tangent function, and maps the input value to the interval [-1,1].

最后通过一层全连接层将经过处理的候选特征映射到统一维度：Finally, the processed candidate features are transformed into Map to uniform dimensions:

其中H是非线性激活函数，是该类无人机专用的维度映射矩阵，是一个Z行D列的权重矩阵，其中Z为的维度，D为的维度，b_z是偏置向量。得到了异构无人机经过特定类型的状态编码器处理的特征向量根据u^type更新作为下一时刻的隐藏状态。Where H is a nonlinear activation function, is a dimensional mapping matrix dedicated to this type of drone, which is a weight matrix with Z rows and D columns, where Z is The dimension of The dimension of is, b _z is the bias vector. We get the heterogeneous UAV The feature vector processed by a specific type of state encoder Update according to u ^type As the hidden state at the next moment.

通过实现这个状态编码器，我们完成了对异构无人机不同维度的属性和能力特征的处理。随后我们将无人机类型处理后维度相同的特征向量无人机的状态特征拼接起来，得到对异构无人机维度相同的观测特征 By implementing this state encoder, we have completed the processing of the attributes and capability characteristics of heterogeneous drones in different dimensions. Feature vectors of the same dimension after processing Status characteristics of drones Stitch them together to get heterogeneous drones Observed features of the same dimension

3.2混杂多无人机集群结构建模3.2 Modeling of hybrid multi-UAV swarm structure

每个集群内都存在一个CH通信无人机与其他所有CMs成员无人机建立通信连接，不同集群的无人机之间也可能存在通信连接，这是混杂多无人机集群中通信网络的组成，由感知过程得到的邻接矩阵E表示。要将系统看作一个“智能体”进行训练，我们需要对混杂多无人机集群结构建模，为此要解决异构无人机类型、处理后维度相同的特征向量、无人机的状态特征、系统内通信网络结构等数据的融合问题。In each cluster, there is a CH communication drone that establishes communication connections with all other CMs member drones. There may also be communication connections between drones in different clusters. This is the composition of the communication network in the mixed multi-drone cluster, which is represented by the adjacency matrix E obtained by the perception process. To train the system as an "intelligent agent", we need to model the structure of the mixed multi-drone cluster. To this end, we need to solve the problem of integrating data such as heterogeneous drone types, feature vectors with the same dimension after processing, drone state characteristics, and communication network structure within the system.

本专利使用图注意力网络(GAT)设计混杂多无人机集群结构。GAT的输入为3.1中得到的各个异构无人机处理后维度相同的观测特征混杂多无人机集群系统的通信网络结构特征E(t)，输出为经图注意力网络处理后的整个混杂多无人机集群系统的观测O(t)。This patent uses the Graph Attention Network (GAT) to design a hybrid multi-drone cluster structure. The input of GAT is the observation features of the same dimension after processing of each heterogeneous drone obtained in 3.1. The communication network structure characteristics E(t) of the hybrid multi-UAV swarm system are output as the observation O(t) of the entire hybrid multi-UAV swarm system after being processed by the graph attention network.

在中得到该无人机的编号l，x_l(t)即编号为l的异构无人机的观测特征 exist The number l of the UAV is obtained from the equation, and x _l (t) is the heterogeneous UAV numbered l. Observational characteristics of

以系统的通信网络结构E(t)作为图的边(即网络结构特征)，使用自注意力机制对所有异构无人机的观测加权平均汇聚，得到整个系统的观测。The communication network structure E(t) of the system is used as the edge of the graph (i.e., the network structure feature), and the self-attention mechanism is used to perform weighted average aggregation on the observations of all heterogeneous drones to obtain the observations of the entire system.

以下是具体实施过程和公式：The following are the specific implementation process and formula:

根据t时刻系统的通信网络结构E(t)(即表示系统通信网络结构的邻接矩阵)，将邻接矩阵E(t)中的e_ij(t)作为图注意力网络的边，如果e_ij(t)为1，则编号为i与编号为j的无人机之间有通信连接，否则为0。特殊的，e_ii(t)为1。According to the communication network structure E(t) of the system at time t (i.e., the adjacency matrix representing the communication network structure of the system), e _ij (t) in the adjacency matrix E(t) is used as the edge of the graph attention network. If e _ij (t) is 1, there is a communication connection between the drones numbered i and j, otherwise it is 0. In particular, e _ii (t) is 1.

随后计算系统中各无人机之间的注意力权重，公式为：Then the attention weights between the drones in the system are calculated as follows:

其中N(t)为t时刻集群结构中非孤立无人机的数量。W为共享权重矩阵，为前馈神经网络，与拼接后的特征向量做点积得到实数，经过LeakeyReLU激活函数，最终归一化得到t时刻的注意力权重α_ij(t)，表示编号为j的异构无人机对编号为i的异构无人机的相对重要性。Where N(t) is the number of non-isolated drones in the cluster structure at time t. W is the shared weight matrix, It is a feedforward neural network. The dot product is performed with the concatenated feature vector to obtain a real number. After the LeakeyReLU activation function, it is finally normalized to obtain the attention weight α _ij (t) at time t, which represents the relative importance of the heterogeneous UAV numbered j to the heterogeneous UAV numbered i.

根据注意力权重计算新的观测特征向量，公式为：The new observation feature vector is calculated based on the attention weights. The formula is:

m为系统中异构无人机数量。加权求和，得到编号为i的异构无人机新的观测特征向量X_i(t)，最终将所有无人机经自注意力机制处理后的观测特征拼接得到系统的观测X(t)m is the number of heterogeneous drones in the system. The weighted sum is used to obtain the new observation feature vector _Xi (t) of the heterogeneous drone numbered i. Finally, the observation features of all drones processed by the self-attention mechanism are concatenated to obtain the system observation X(t).

X(t)＝(X₁(t)||X₂(t)||…||X_i(t)||…||X_m(t))X(t)=(X ₁ (t)||X ₂ (t)||…||X _i (t)||…||X _m (t))

其中m为系统中无人机总数。X(t)经一层全连接层处理，汇聚得到最终的混杂多无人机集群系统观测特征O(t)Where m is the total number of drones in the system. X(t) is processed by a fully connected layer and aggregated to obtain the final observation feature O(t) of the hybrid multi-drone cluster system.

3.3模型训练3.3 Model Training

使用Actor-Critic架构训练强化学习模型，将混杂多无人机集群系统看作一个智能体，为其部署训练好的Actor策略网络，Actor网络将给出当前时刻系统将要采取的动作，即集群结构调整策略。使用Critic价值网络进行训练，计算动作的价值估计。The reinforcement learning model is trained using the Actor-Critic architecture. The mixed multi-UAV swarm system is regarded as an intelligent agent, and the trained Actor strategy network is deployed for it. The Actor network will give the action that the system will take at the current moment, that is, the swarm structure adjustment strategy. The Critic value network is used for training to calculate the value estimate of the action.

·Acotr策略网络Acotr Strategy Network

Actor网络用于根据当前的系统状态O(t)选取动作输出一个策略策略的参数θ可以通过梯度上升算法来优化策略的性能，给出更好的集群结构调整策略。The Actor network is used to select actions based on the current system state O(t) Output a strategy The parameter θ of the strategy can be used to optimize the performance of the strategy through the gradient ascent algorithm, giving a better cluster structure adjustment strategy.

·Critic价值网络Critic Value Network

根据神经网络输出的混杂多无人机集群系统地观测O(t)和Acotr网络输出的策略Critic网络计算当前状态的值函数，输出当前状态下采取的回报期望。A strategy for systematically observing O(t) and Acotr network outputs in a mixed multi-UAV swarm based on neural network outputs The Critic network calculates the value function of the current state and outputs the action taken in the current state. expectations of return.

·策略梯度更新Policy gradient update

采用PPO算法进行训练，给出策略梯度公式：The PPO algorithm is used for training, and the policy gradient formula is given:

其中θ为策略参数，T表示系统总运行时间，r_t(θ)为训练后更新的策略与旧策略的概率比，设定ε限制概率比的变化，保证策略变化较为平缓。为优势函数，根据决策模型构建时给出的奖励函数与值函数计算。Where θ is the policy parameter, T represents the total system running time, r _t (θ) is the probability ratio of the updated policy to the old policy after training, and ε is set to limit the change of the probability ratio to ensure that the policy changes smoothly. is the advantage function, which is calculated based on the reward function and value function given when the decision model is constructed.

完成模型训练后本方法可以根据不同情境给出最优的集群结构调整策略，平衡系统的通信成本、异构无人机的状态、网络结构的稳定性，达到最好的优化效果。After completing the model training, this method can give the optimal cluster structure adjustment strategy according to different scenarios, balance the communication cost of the system, the status of heterogeneous drones, and the stability of the network structure to achieve the best optimization effect.

3.4集群结构调整与网络重构3.4 Cluster structure adjustment and network reconstruction

根据训练后的模型给出的集群结构调整策略action(t)，对各无人机的六个因子加权求和得到每个无人机的效用值。下面给出这六个因子的详细设计：According to the cluster structure adjustment strategy action(t) given by the trained model, the utility value of each drone is obtained by weighted summing up the six factors of each drone. The detailed design of these six factors is given below:

·剩余能量百分比Remaining energy percentage

从3.1中得到的无人机观测x_i(t)中提取剩余能量属性，除以x_i(0)中初始状态的能量。Extract the residual energy property from the UAV observation _xi (t) obtained in 3.1 by dividing it by the energy of the initial state in _xi (0).

·无人机在网络结构中的度The degree of drones in the network structure

无人机的度除以系统中其他无人机的总数The speed of the drone divided by the total number of other drones in the system

·无人机和邻居无人机的平均距离(负效用)The average distance between a drone and its neighbor’s drones (disutility)

N_i(t)为t时刻邻居无人机的编号集合，d_ij为两无人机之间的距离，由无人机观测x_i(t)和x_j(t)中无人机位置坐标计算得到。N _i (t) is the set of numbers of neighboring drones at time t, and d _ij is the distance between two drones, which is calculated from the drone position coordinates in the drone observations x _i (t) and x _j (t).

·无人机和邻居无人机的平均运动状态相似度The average motion state similarity between the drone and its neighboring drones

N_i(t)为t时刻邻居无人机的编号集合，speed为3.1中得到的无人机观测x_i(t)与x_j(t)中汇聚出的无人机速度属性，θ为观测中状态特征内的方向，V_ij(t)为两无人机之间的运动状态相似程度，为邻居无人机运动状态相似程度的平均值，求方差得到平均运动状态相似度factor₄。N _i (t) is the set of neighboring drone numbers at time t, speed is the drone speed attribute obtained from the drone observations x _i (t) and x _j (t) obtained in 3.1, θ is the direction within the state feature in the observation, and _Vij (t) is the similarity of the motion state between the two drones. is the average value of the similarity of the motion states of neighboring drones, and the variance is calculated to obtain the average motion state similarity factor ₄ .

·通信连接平均保持时间Average communication connection holding time

N_i(t)为t时刻邻居无人机的编号集合，其中ct_ij(t)为t时刻两无人机之间通信连接保持的时间。N _i (t) is the set of numbers of neighboring drones at time t, where ct _ij (t) is the duration of the communication connection between two drones at time t.

·无人机和邻居无人机的平均通信成本The average communication cost between a drone and its neighboring drones

N_i(t)为t时刻邻居无人机的编号集合，cost_ij(t)为感知过程中对通信成本的计算方法，通过异构无人机的编号i和j找到对应的无人机，并根据异构无人机类型和距离计算得到的两无人机之间的通信成本，体现了通信能力的多维异构性。N _i (t) is the set of neighboring drone numbers at time t, cost _ij (t) is the calculation method of the communication cost in the perception process, and the corresponding drone is found by the numbers i and j of the heterogeneous drones. The communication cost between the two drones is calculated based on the type and distance of the heterogeneous drones, which reflects the multi-dimensional heterogeneity of the communication capability.

使用聚类算法依据通信范围内无人机的效用值进行聚类并选出CH通信无人机，划分出各个集群。随后使用路由算法，通过通信无人机构建集群内的通信网络，系统中的所有无人机都会发出一个带有集群编号的信号，随后接收其他无人机发出的信号。从接收到的信号，无人机将知晓信号源所处的集群，对于能接收到来自其他集群的无人机信号的无人机，将它们作为跨集群通信连接的候选，最终选出通信成本最低的候选作为两个集群间的跨集群通信连接并构建。这样就完成了整个集群结构调整过程。A clustering algorithm is used to cluster drones according to their utility values within the communication range and select CH communication drones to divide each cluster. Then a routing algorithm is used to build a communication network within the cluster through communication drones. All drones in the system will send out a signal with a cluster number and then receive signals from other drones. From the received signal, the drone will know the cluster where the signal source is located. For drones that can receive signals from drones in other clusters, they will be used as candidates for cross-cluster communication connections. Finally, the candidate with the lowest communication cost is selected as the cross-cluster communication connection between the two clusters and built. This completes the entire cluster structure adjustment process.

Claims

1. A method for adaptively adjusting the structure of a hybrid multi-UAV cluster, characterized in that a state encoder for heterogeneous UAV types is first proposed to distinguish and process different types of UAVs, aggregate the attributes and capabilities of heterogeneous UAVs in different dimensions in a hybrid scenario, and obtain heterogeneous UAV features with the same dimension; then, a clustering algorithm based on reinforcement learning is used to fuse the processed multi-dimensional heterogeneous data, and a global adjustment strategy is given to adjust the cluster structure and reconstruct the communication network.

2. The method for adaptively adjusting the structure of a mixed multi-UAV cluster according to claim 1 is characterized in that: a mixed multi-UAV cluster system with uniform distribution of UAVs is adopted, and any UAV in the kth cluster C _k Represented by a triple: in Represents the heterogeneous drones in the kth cluster in the system Type of drone; Representative heterogeneous drones Attributes and capabilities of Representing heterogeneous drones Status characteristics;

The hybrid multi-UAV cluster system consists of multiple heterogeneous UAV clusters C _k , each of which consists of heterogeneous UAVs _. composition, Where _Nk is the number of drones in the cluster; there are two types of connection relationships between heterogeneous drones in the system, one is the communication connection within the cluster, and the other is the communication connection across clusters; the communication connection between heterogeneous drones constitutes a mixed network structure within the system; both connections are dynamic and will change with the adjustment of the cluster structure; integrating the intra-cluster connection and cross-cluster connection in the heterogeneous multi-drone system, the mixed multi-drone cluster system MAS with N clusters is finally described as MAS = ( _n1 , _n2 ,…, _nN , E), where _nk = < _Ck , _CHk > represents the complete perception of the kth cluster, _Ck is the set of heterogeneous drones in the kth cluster, _CHk represents the current communication drone of the kth cluster, and E is the adjacency matrix representing the network structure within the mixed multi-drone cluster system; using the binary Represents the communication connection in the system; the communication cost of all communication connections when they are established and maintained at each subsequent moment is expressed as Represents the heterogeneous drones in the cluster C _k With heterogeneous drones in cluster C _l communication costs; Function c will be based on the drone and The communication cost is calculated based on the types of nodes and the distance between them.

3. The method for adaptively adjusting the structure of a hybrid multi-UAV swarm according to claim 2, characterized in that:

The state S(t) represents the cluster structure state observation of the entire system at time interval t, including the state, distribution, communication network structure and CH communication drones of the heterogeneous drones in each cluster, which is expressed as:

S(t)＝(S ₁ (t),S ₂ (t),…,S _k (t),…,S _N(t) (t),E(t))

Where N(t) is the number of heterogeneous drone clusters in the system at time t, E(t) is the communication network structure of the system at time t, and S _k (t) is the number of heterogeneous drones in the kth cluster. Observations at time interval t A collection of:

action is the cluster structure adjustment strategy adopted by the system at time t, which is expressed as:

Where λ _i (i∈[1,6]) represents the utility weight of the i-th factor of the drone, and λ _i satisfies the following constraints;

λ ₁ +λ ₂ +λ ₃ +λ ₄ +λ ₅ +λ ₆ =1

Reward r(t) is the reward for the system taking an action at time t and state S(t) The global reward obtained by adjusting the cluster structure is expressed as:

λ is the weight, _rs is the reward for cluster structure stability, and _rc is the negative reward caused by energy consumption;

r _s =α ₁ r _ATT +α ₂ r _CTD +α ₃ r _SMD +α ₄ r _CTC

r _c =-(β ₁ r _build +β ₂ r _maintenance +β ₃ r _isolated +β ₄ r _fly )

Among them, α and β are weights, r _ATT represents the average time that the drones stay in the current cluster; r _CTD represents the tightness of the network structure between drones in each cluster; r _SMD represents the similarity of the motion states between drones in each cluster; r _CTC represents the average time that the communication connection between drones in each cluster is maintained; r _build represents the communication cost of building a new communication connection caused by the cluster structure adjustment; r _maintenance represents the maintenance cost of the communication network in the system from the previous moment to this moment; r _isolated represents the additional communication loss caused by the isolated clusters in all clusters; r _fly represents the flight energy consumption of all drones in the system at the current moment;

According to the definition of Markov decision, the strategy of each cluster should meet the optimization condition, that is, the strategy of the system should maximize its reward function r(t), which is:

in is the action space that the system can take, which is the set of utility weights of all possible actions.

4. A hybrid multi-UAV cluster structure adaptive adjustment method according to claim 1, characterized in that: a state encoder for heterogeneous UAV types is designed using GRU, and the input of the state encoder is the type of the heterogeneous UAV Attributes and capability characteristics of heterogeneous drones in different dimensions The output is a feature vector of uniform dimension Remember to input the drone Type number For type, the drone stored in GRU Historical status is a vector representing the drone historical features at the last moment; in dealing with When Add to the input data;

The reset gate r ^type in GRU controls whether to ignore the hidden state of GRU The formula is:

Type is drone The drone type number, and is the weight matrix of the input features and hidden states for r ^type in the GRU network that processes the type type drone state. is the bias vector, σ is the sigmoid function;

The update gate u ^type in GRU is used to control the GRU to update the hidden state The degree of, its formula is:

in and is the weight matrix of input features and hidden states for u ^type in the GRU network that processes the type type drone state. is the bias vector and σ is the sigmoid function.

5. A hybrid multi-UAV cluster structure adaptive adjustment method according to claim 4, characterized in that: the candidate features at the current moment are calculated according to r ^type and u ^type

in Represents the multiplication of each element of the reset gate vector and the hidden state vector, and is the input and hidden state pair in the GRU network that processes the type type drone state The weight matrix, is the bias vector, tanh represents the hyperbolic tangent function, which maps the input value to the interval [-1,1]; through a fully connected layer, the processed candidate features are Map to uniform dimensions:

Where H is a nonlinear activation function, is a dimensional mapping matrix dedicated to this type of drone, which is a weight matrix with Z rows and D columns, where Z is The dimension of dimension, b _z is the bias vector; heterogeneous UAVs are obtained The feature vector processed by a specific type of state encoder Type of drone Feature vectors of the same dimension after processing Status characteristics of drones Put them together to get the heterogeneous drone numbered l Observed features of the same dimension

6. A hybrid multi-UAV cluster structure adaptive adjustment method according to claim 5, characterized in that: according to the communication network structure E(t) of the system at time t, that is, the adjacency matrix representing the communication network structure of the system, the e _ij (t) in the adjacency matrix E(t) is used as the edge of the graph attention network, and the attention weights between the UAVs in the system are calculated, and the formula is:

Where N(t) is the number of non-isolated drones in the cluster structure at time t, W is the shared weight matrix, is a feedforward neural network, which performs dot product with the concatenated feature vector to obtain a real number, passes through the LeakeyReLU activation function, and is finally normalized to obtain the attention weight α _ij (t) at time t, which represents the relative importance of the heterogeneous UAV numbered j to the heterogeneous UAV numbered i;

The new observation feature vector is calculated based on the attention weights. The formula is:

m is the number of heterogeneous drones in the system; weighted summation is performed to obtain the new observation feature vector _Xi (t) of the heterogeneous drone numbered i, and finally the observation features of all drones processed by the self-attention mechanism are concatenated to obtain the system observation X(t)

X(t)=(X ₁ (t)||X ₂ (t)||…||X _i (t)||…||X _m (t))

X(t) is processed by a fully connected layer and aggregated to obtain the final observation features O(t) of the hybrid multi-UAV swarm system.

7. A method for adaptively adjusting the structure of a hybrid multi-UAV swarm according to claim 6, characterized in that: an Actor-Critic architecture is used to train a reinforcement learning model, the hybrid multi-UAV swarm system is regarded as an intelligent agent, and a trained Actor strategy network is deployed for it, and the Actor network is used to select actions according to the current system state O(t) Output a strategy The parameter θ of the strategy is optimized through the gradient ascent algorithm to give a better cluster structure adjustment strategy; according to the mixed multi-UAV cluster output by the neural network, the strategy output by O(t) and Acotr network is systematically observed. The Critic network calculates the value function of the current state and outputs the action taken in the current state. The expected return is then calculated, and the advantage function is updated with the policy gradient and value function.

8. The method for adaptively adjusting the structure of a hybrid multi-UAV swarm according to claim 7, characterized in that: according to the action taken by the system That is, the utility weight of cluster structure adjustment, and the utility is calculated by weighted summing up the various factors of the drone; there are six utility factors of the drone: the remaining energy percentage, the degree of the drone in the network structure, the average distance between the drone and other drones, the average motion state similarity between the drone and the neighboring drones, the average maintenance time of the communication connection, and the average communication cost between the drone and the neighboring drones; the clustering algorithm is used to cluster the drones according to their utility values within the communication range and select the CH communication drones to divide the clusters, and then the routing algorithm is used to build the communication network within the cluster through the communication drone CH to complete the cluster structure adjustment process.