CN117311390B

CN117311390B - Intelligent combined guidance method for closed-loop tracking of aerospace shuttle aircraft

Info

Publication number: CN117311390B
Application number: CN202311463606.4A
Authority: CN
Inventors: 张秀云; 于卉; 宗群; 李智禹; 张睿隆
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2023-11-06
Filing date: 2023-11-06
Publication date: 2024-03-19
Anticipated expiration: 2043-11-06
Also published as: CN117311390A

Abstract

The invention belongs to the technical field of closed-loop tracking intelligent combination guidance of air-to-space aircraft, and specifically relates to a closed-loop tracking intelligent combination guidance method of air-to-space aircraft, which includes the following steps: S1: Design an LSTM aircraft trajectory prediction algorithm; S2: Establish RLV and then Enter the segment error model and transform the constrained control problem; S3: Design an aircraft tracking controller based on adaptive dynamic programming. The present invention adopts a prediction correction framework, and combines prediction and adaptive dynamic programming based on the LSTM method to design a controller, which can ensure that the selected performance index function reaches the optimum within a limited time domain, obtain the optimal feedback guidance law, and improve the performance of the controller. autonomy.

Description

A closed-loop tracking intelligent combined guidance method for air-to-space round-trip aircraft

技术领域Technical Field

本发明属于空天往返飞行器闭环跟踪智能组合制导技术领域，具体涉及一种空天往返飞行器闭环跟踪智能组合制导方法。The present invention belongs to the technical field of closed-loop tracking and intelligent combined guidance of an aerospace round-trip aircraft, and in particular relates to a closed-loop tracking and intelligent combined guidance method for an aerospace round-trip aircraft.

背景技术Background Art

可重复使用运载器(RLV)是指能够自由往返于地球表面与空间轨道之间、具备多用途且可重复使用的飞行器，未来实现快速、可靠及廉价进出空间的必然趋势，也是当前航空航天领域的研究热点，由于RLV的再入段具有变速快，耦合性强，模型不确定性以及外部环境等因素，使得再入段控制系统设计面临更大挑战，为了保证所控飞行器安全稳定地进行再入飞行，对于RLV再入段轨迹的优化与制导律的设计尤为关键，RLV再入轨迹优化的目标是在满足状态约束和控制量约束等条件的情况下，实现到达某个最优目标的飞行轨迹控制，与标准轨迹制导不同，预测校正制导方法不依赖于参考轨迹，而是在飞行过程中首先对飞行终点进行预测，并根据预测落点与期望终点的偏差来设计控制器，具有更高的灵活性和落点精度，且不依赖于再入初始状态，对初始再入扰动的抗干扰能力更强，日益成为各国研究的发展方向；Reusable launch vehicle (RLV) refers to a multi-purpose and reusable aircraft that can freely travel between the earth's surface and space orbit. It is an inevitable trend to achieve fast, reliable and cheap access to space in the future, and is also a research hotspot in the current aerospace field. Due to the fast speed change, strong coupling, model uncertainty and external environment of the reentry phase of RLV, the design of the reentry phase control system faces greater challenges. In order to ensure the safe and stable reentry flight of the controlled aircraft, the optimization of the RLV reentry phase trajectory and the design of the guidance law are particularly critical. The goal of RLV reentry trajectory optimization is to achieve flight trajectory control to reach a certain optimal target while satisfying conditions such as state constraints and control quantity constraints. Unlike standard trajectory guidance, the prediction and correction guidance method does not rely on the reference trajectory, but first predicts the flight endpoint during the flight, and designs the controller based on the deviation between the predicted landing point and the expected endpoint. It has higher flexibility and landing point accuracy, and does not rely on the initial state of reentry. It has stronger anti-interference ability to the initial reentry disturbance, and is increasingly becoming the development direction of research in various countries;

飞行器轨迹预测是智能飞行系统中不可或缺的功能部件之一，在复杂的博弈环境中，提前预测飞行器的轨迹将为后续机动决策提供参考方向，轨迹预测是指在已有信息的基础上，按照一定的规律或方法对未来时刻的轨迹进行估计；Aircraft trajectory prediction is one of the indispensable functional components in intelligent flight systems. In a complex game environment, predicting the trajectory of an aircraft in advance will provide a reference direction for subsequent maneuvering decisions. Trajectory prediction refers to estimating the trajectory at a future moment based on existing information and in accordance with certain rules or methods.

从目前的研究来看，可以将现有的技术分为两类：运动学模型方法和基于数据的方法，其中前者的应用较为广泛，例如，2017年，哈尔滨工业大学的魏喜庆等针对高超声速飞行器的周期跳跃运动问题，提出了一种与Singer模型相结合的扩展卡尔曼滤波器进行状态估计，进一步递推目标运动轨迹，2018年，空军预警学院的张凯等学者通过对目标运动和飞行意图特征的构建，利用贝叶斯理论进行迭代推导飞行模型，随后，他们利用蒙特卡洛采样方法实现了轨迹预测，虽然，上述方法具有良好的可解释性，然而，有限的预测精度和较长的预测时间使得他们只能应用于某些特定场景。From the current research, the existing technologies can be divided into two categories: kinematic model methods and data-based methods. The former is more widely used. For example, in 2017, Wei Xiqing and others from Harbin Institute of Technology proposed an extended Kalman filter combined with the Singer model for state estimation to further recursively infer the target motion trajectory, in response to the periodic jumping motion problem of hypersonic aircraft. In 2018, Zhang Kai and other scholars from the Air Force Early Warning Academy constructed target motion and flight intention characteristics and used Bayesian theory to iteratively derive the flight model. Subsequently, they used the Monte Carlo sampling method to achieve trajectory prediction. Although the above methods have good interpretability, their limited prediction accuracy and long prediction time make them only applicable to certain specific scenarios.

综上，现有技术中的制导方不能够在有限时域内确保选定的性能指标函数达到最优，无法有效的得到最优反馈制导律，导致控制器的自主性较低。In summary, the guidance method in the prior art cannot ensure that the selected performance indicator function reaches the optimum within a limited time domain, and cannot effectively obtain the optimal feedback guidance law, resulting in low autonomy of the controller.

发明内容Summary of the invention

本发明的目的是提供一种空天往返飞行器闭环跟踪智能组合制导方法，采用预测矫正框架，基于LSTM方法进行预测和自适应动态规划结合设计控制器，能够在有限时域内确保选定的性能指标函数达到最优，得到最优反馈制导律，提高了控制器的自主性。The purpose of the present invention is to provide an intelligent combined guidance method for closed-loop tracking of an air-to-space round-trip vehicle. It adopts a predictive correction framework, combines prediction with adaptive dynamic programming based on the LSTM method to design a controller, and can ensure that the selected performance indicator function reaches the optimal level within a limited time domain, obtain the optimal feedback guidance law, and improve the autonomy of the controller.

本发明采取的技术方案具体如下：The technical solution adopted by the present invention is as follows:

一种空天往返飞行器闭环跟踪智能组合制导方法，包括以下步骤：A closed-loop tracking intelligent combined guidance method for an air-to-space round-trip aircraft comprises the following steps:

S1：设计一个LSTM的飞行器轨迹预测算法；S1: Design an LSTM-based aircraft trajectory prediction algorithm;

S2：建立RLV再入段误差模型并转化约束控制问题；S2: Establish the RLV reentry error model and transform the constraint control problem;

S3：设计基于自适应动态规划的飞行器跟踪控制器。S3: Design an adaptive dynamic programming based vehicle tracking controller.

进一步地，所述S1包括以下步骤：Furthermore, the S1 comprises the following steps:

S101：通过每一时间步系统状态数据和距离终点的误差数据组合进行数据预处理操作搭建RLV轨迹预测问题的信息数据集和数据库；S101: Perform data preprocessing operations by combining system state data at each time step and error data from the end point to build an information data set and database for the RLV trajectory prediction problem;

S102：通过输入数据获得环境和系统间的耦合信息，搭建LSTM网络进行状态预测，使用反向传播算法更新预测网络权值，得到系统的预测状态模型，实现每一时间步的系统状态轨迹的实时预测；S102: Obtain the coupling information between the environment and the system through input data, build an LSTM network for state prediction, use the back propagation algorithm to update the prediction network weights, obtain the system's predicted state model, and realize real-time prediction of the system state trajectory at each time step;

S103：在飞行过程中，利用得到的预测模型不断对飞行终点进行预测，并根据预测落点与期望终点的偏差作为控制误差，输入给控制器来调整控制量。S103: During the flight, the obtained prediction model is used to continuously predict the flight endpoint, and the deviation between the predicted landing point and the expected endpoint is used as the control error and input to the controller to adjust the control amount.

进一步地，所述S101中的数据预处理包括信息融合和特征提取。Furthermore, the data preprocessing in S101 includes information fusion and feature extraction.

进一步地，所述S101中信息数据集的建立方法包括以下步骤：Furthermore, the method for establishing the information data set in S101 includes the following steps:

S10101：获取信息，依靠传感器，雷达等，获取随时间变化的飞行器的各个状态信息；S10101: Obtain information, relying on sensors, radars, etc., to obtain various status information of the aircraft that changes over time;

S10102：数据预处理，采用零均值标准化的预处理方法对数据进行预处理；S10102: Data preprocessing: using zero mean standardization preprocessing method to preprocess the data;

S10103：构建训练样本，将轨迹数据分解为训练样本和标签。S10103: Construct training samples and decompose trajectory data into training samples and labels.

进一步地，所述构建训练样本包括以下步骤：Furthermore, constructing the training sample comprises the following steps:

从数据集中第一个轨迹点开始，按时间顺序向下，选择前20个轨迹点的时间对应的飞行器状态信息来预测下一个轨迹点的状态信息，其中每一时间步的状态信息作为神经网络对应细胞的输入，选择分离间隔为1，从第二个轨迹点开始，用同样的方法选择训练样本。Starting from the first trajectory point in the data set, in chronological order, select the aircraft state information corresponding to the time of the first 20 trajectory points to predict the state information of the next trajectory point, where the state information of each time step is used as the input of the corresponding cell of the neural network, and the separation interval is selected as 1. Starting from the second trajectory point, the training samples are selected in the same way.

进一步地，所述S102包括以下步骤：Furthermore, the S102 includes the following steps:

S10201：将飞行器的状态误差信息作为输入，通过嵌入函数将输入数据映射到新的空间；S10201: taking the state error information of the aircraft as input, and mapping the input data to a new space through an embedding function;

S10202：将飞行器的历史状态信息作为输入，将误差信息与飞行器历史的状态信息融合；S10202: Taking the historical state information of the aircraft as input, fusing the error information with the historical state information of the aircraft;

S10203：利用LSTM网络根据观测的历史状态信息和融合信息预测飞行器的未来轨迹；S10203: Use LSTM network to predict the future trajectory of the aircraft based on the observed historical state information and fusion information;

S10204：通过样本数据构造预测模型，对训练样本和预测样本进行归一化处理，训练样本输入到网络模型中训练网络，根据损失大小调整网络结构，利用测试样本测试网络性能，得出预测结果。S10204: Construct a prediction model through sample data, normalize the training samples and prediction samples, input the training samples into the network model to train the network, adjust the network structure according to the loss size, use the test samples to test the network performance, and obtain the prediction results.

进一步地，所述信息数据集为14维，包括飞行器当前状态和终点状态的状态差信息包括地心距信息差、经度信息差、纬度信息差、速度信息差、航迹角信息差、航向角信息差和倾侧角信。Furthermore, the information data set is 14-dimensional, including state difference information between the current state and the terminal state of the aircraft, including geocentric distance information difference, longitude information difference, latitude information difference, speed information difference, track angle information difference, heading angle information difference and roll angle information.

进一步地，所述S2包括以下步骤：Further, the S2 comprises the following steps:

建立RLV再入段误差模型，设计出一种同时反应编队误差，控制量和避碰作用的性能指标函数，通过安全障碍函数将场景下的避碰问题转换为约束问题，将避碰控制问题转换为误差系统的稳定控制问题。An RLV reentry error model is established, and a performance index function that simultaneously reflects the formation error, control quantity and collision avoidance effect is designed. The collision avoidance problem in the scenario is converted into a constraint problem through the safety obstacle function, and the collision avoidance control problem is converted into a stable control problem of the error system.

进一步地，所述S3包括以下步骤：Further, the S3 comprises the following steps:

设计基于自适应动态规划的控制算法，构建评判网络去近似最优性能指标函数并求解最优控制策略，采用策略梯度法，对神经网络所有权值的范数进行更新，利用网络输出迭代，最终获得最优控制策略Design a control algorithm based on adaptive dynamic programming, build a judgment network to approximate the optimal performance index function and solve the optimal control strategy, use the policy gradient method to update the norm of all neural network values, use the network output iteration, and finally obtain the optimal control strategy

进一步地，所述RLV的状态需要严格满足以下约束条件：Furthermore, the state of the RLV needs to strictly meet the following constraints:

1).定义控制算法中飞行器状态量x，满足起点状态条件x₀和终点状态条件x_f；1). Define the aircraft state x in the control algorithm, satisfying the starting state condition x ₀ and the end state condition x _f ;

2).受飞行器性能影响，在再入过程中，定义控制量u满足约束u_min≤u≤u_max，状态量x满足约束x_min≤x≤x_max，其中u_min表示控制量上界，u_max表示控制量下界；其中x_min表示状态量上界，x_max表示状态量下界。2). Affected by the performance of the aircraft, during the reentry process, the control quantity u is defined to satisfy the constraint u _min ≤u≤u _max , and the state quantity x satisfies the constraint x _min ≤x≤x _max , where u _min represents the upper limit of the control quantity and u _max represents the lower limit of the control quantity; where x _min represents the upper limit of the state quantity and x _max represents the lower limit of the state quantity.

本发明取得的技术效果为：The technical effects achieved by the present invention are:

本发明的一种空天往返飞行器闭环跟踪智能组合制导方法采用预测矫正框架，基于LSTM方法进行预测和自适应动态规划结合设计控制器，解决了传统动态规划控制算法的“维数灾”问题，通过学习对制导律不断迭代更新，最终在有限时域内确保选定的性能指标函数达到最优，得到最优反馈制导律，提高了控制器的自主性。The invention discloses a closed-loop tracking intelligent combined guidance method for a space shuttle vehicle, which adopts a prediction-correction framework and combines prediction and adaptive dynamic programming based on the LSTM method to design a controller, thereby solving the "curse of dimensionality" problem of traditional dynamic programming control algorithms. The guidance law is continuously iterated and updated through learning, and ultimately the selected performance index function is ensured to be optimal within a limited time domain, thereby obtaining the optimal feedback guidance law and improving the autonomy of the controller.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的基于LSTM和自适应动态规划的空天往返飞行器闭环跟踪智能组合制导框图；FIG1 is a block diagram of an air-space round-trip vehicle closed-loop tracking intelligent combined guidance system based on LSTM and adaptive dynamic programming according to the present invention;

图2是本发明的LSTM的飞行器预测模型图；FIG2 is a diagram of an aircraft prediction model of the LSTM of the present invention;

图3是本发明的LSTM原理图；Fig. 3 is a schematic diagram of LSTM of the present invention;

图4是本发明的飞行器状态预测结果图；FIG4 is a diagram of aircraft state prediction results of the present invention;

图5是本发明的飞行器状态预测绝对百分比误差变化曲线图；FIG5 is a graph showing a change in absolute percentage error of aircraft state prediction according to the present invention;

图6是本发明基于自适应动态规划的跟踪控制的状态变化曲线图；6 is a state change curve diagram of tracking control based on adaptive dynamic programming of the present invention;

图7是本发明基于自适应动态规划的跟踪控制的误差变化曲线图；7 is a graph showing the error variation of the tracking control based on adaptive dynamic programming of the present invention;

图8是本发明的飞行器评判神经网络权值变换曲线图；FIG8 is a graph showing weight transformation of an aircraft evaluation neural network according to the present invention;

图9是本发明的LSTM模型参数图；FIG9 is a LSTM model parameter diagram of the present invention;

图10是本发明的飞行器初始条件和终点约束图。FIG. 10 is a diagram showing the initial conditions and terminal constraints of the aircraft of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的及优点更加清楚明白，以下结合实施例对本发明进行具体说明。应当理解，以下文字仅仅用以描述本发明的一种或几种具体的实施方式，并不对本发明具体请求的保护范围进行严格限定。In order to make the purpose and advantages of the present invention more clearly understood, the present invention is specifically described below in conjunction with embodiments. It should be understood that the following text is only used to describe one or several specific embodiments of the present invention, and does not strictly limit the scope of protection of the specific claims of the present invention.

实施例1：Embodiment 1:

如图1所示，一种空天往返飞行器闭环跟踪智能组合制导方法，包括以下步骤：As shown in FIG1 , a closed-loop tracking intelligent combined guidance method for an air-to-space round-trip aircraft includes the following steps:

如图2和图3所示，S1包括以下步骤：As shown in Figures 2 and 3, S1 includes the following steps:

S101：通过每一时间步系统状态数据和距离终点的误差数据组合进行信息融合、特征提取等数据预处理操作搭建RLV轨迹预测问题的信息数据集和数据库；S101: Build an information data set and database for RLV trajectory prediction problem by combining system state data and error data from the end point at each time step to perform data preprocessing operations such as information fusion and feature extraction;

为了给飞行器的控制器提供控制方向，首先需要预测在未来时间步的飞行器的状态，在开展飞行器的状态预测时，未来状态和飞行器历史状态信息以及飞行器当前状态和终点状态的状态差有关，所以预测要根据大量经验数据分析这些特征与未来行为的相关性来建立预测模型。In order to provide control direction for the aircraft controller, it is first necessary to predict the state of the aircraft in the future time step. When predicting the state of the aircraft, the future state is related to the historical state information of the aircraft and the state difference between the current state and the terminal state of the aircraft. Therefore, the prediction should be based on a large amount of empirical data to analyze the correlation between these characteristics and future behaviors to establish a prediction model.

定义观测的时间步长为T_obs，预测的时间步长为T_pred，定义飞行器的历史状态轨迹为O＝{p^t|t＝1,2,…,T_obs}，其中p^t表示其在时间t处的飞行器的状态信息，包括：(1)地心距信息r，(2)经度信息θ，(3)纬度信息(4)速度信息θ，(5)航迹角信息γ，(6)航向角信息χ，(7)倾侧角信息σ；定义飞行器当前状态和终点状态的状态差集合为，飞行器真实的未来状态信息用Y＝{p^t|t＝T_obs+1,T_obs+2,…,T_obs+T_pred}表示。因此，轨迹预测问题可以描述为：Define the observation time step as T _obs , the prediction time step as T _pred , and define the historical state trajectory of the aircraft as O = { p ^t |t = 1, 2, ..., T _obs }, where p ^t represents the state information of the aircraft at time t, including: (1) the distance from the center of the earth r, (2) the longitude information θ, (3) the latitude information (4) speed information θ, (5) track angle information γ, (6) heading angle information χ, (7) roll angle information σ; define the state difference set between the current state and the final state of the aircraft as , the real future state information of the aircraft is represented by Y = {p ^t |t = T _obs +1, T _obs +2,…, T _obs +T _pred }. Therefore, the trajectory prediction problem can be described as:

Y＝f_o(O,O_e) (1)Y＝f _o (O,O _e ) (1)

轨迹预测的目标是找到从集合O,O_e到集合Y的映射函数f_o，使得该函数可以基于飞行器的历史轨迹预测其未来轨迹。The goal of trajectory prediction is to find a mapping function f _o from the set O, O _e to the set Y, so that the function can predict the future trajectory of the aircraft based on its historical trajectory.

由于基于深度学习的预测方法需依据大量数据训练网络，首先，需要建立一个飞行器状态特征的信息数据集，可以分为以下3步：Since the prediction method based on deep learning needs to train the network based on a large amount of data, first of all, it is necessary to establish an information data set of aircraft status characteristics, which can be divided into the following three steps:

S10101：获取信息部分：依靠传感器，雷达等，获取随时间变化的飞行器的各个状态信息，由上述方法仿真获得的数据集为14维，其中包含飞行器的状态特征信息(地心距信息、经度信息、纬度信息、速度信息、航迹角信息、航向角信息、倾侧角信息)，以及飞行器当前状态和终点状态的状态差信息(地心距信息差、经度信息差、纬度信息差、速度信息差、航迹角信息差、航向角信息差、倾侧角信息差)。S10101: Information acquisition part: Relying on sensors, radars, etc., various status information of the aircraft that changes with time is acquired. The data set obtained by simulation by the above method is 14 dimensions, which includes the status characteristic information of the aircraft (geocentric distance information, longitude information, latitude information, speed information, track angle information, heading angle information, roll angle information), and the status difference information between the current state and the terminal state of the aircraft (geocentric distance information difference, longitude information difference, latitude information difference, speed information difference, track angle information difference, heading angle information difference, roll angle information difference).

S10102：数据预处理：由于飞行器不同状态特征信息的数据样本之间存在数量级差异，在训练网络时会导致量级较大的数据样本占据主导地位，导致收敛速度较慢和准确度较低，所以需要数据预处理。本发明采用零均值标准化的预处理方法，其计算公式如下：S10102: Data preprocessing: Since there are order of magnitude differences between data samples of different state feature information of aircraft, when training the network, data samples with larger order of magnitude will dominate, resulting in slower convergence speed and lower accuracy, so data preprocessing is required. The present invention adopts a zero-mean normalization preprocessing method, and its calculation formula is as follows:

式中，d_i是每一特征信息的原始样本数据，d′_i是每一特征信息处理之后的样本数据，是每一特征信息的全部样本数据的均值，τ_i是每一特征信息的全部样本数据的标准差，其中i＝{1,2,...,14}，代表数据集14个维度中的各个维度。即根据数据集14维中各维状态信息的均值和标准差，将所有数据样本这一维的原始数据变换为归一化的数据。处理将数据统一到特定区间，使得学习过程中不会因为不同数据样本信息差异太大导致学习速度慢和网络准确度低。In the formula, d _i is the original sample data of each feature information, d ′ _i is the sample data after each feature information is processed, is the mean of all sample data of each feature information, τ _i is the standard deviation of all sample data of each feature information, where i = {1, 2, ..., 14}, representing each dimension of the 14 dimensions of the data set. That is, according to the mean and standard deviation of the state information of each dimension in the 14 dimensions of the data set, the original data of this dimension of all data samples is transformed into normalized data. The processing unifies the data into a specific interval, so that the learning process will not be slow due to the large difference in information of different data samples, resulting in low learning speed and low network accuracy.

S10103：构建训练样本构建方法：轨迹预测是一个有监督的学习问题，需要将轨迹数据分解为训练样本和标签。从数据集中第一个轨迹点开始，按时间顺序向下，选择前20个轨迹点的时间对应的飞行器状态信息来预测下一个轨迹点的状态信息，其中每一时间步的状态信息作为神经网络对应细胞的输入。然后，为了保证样本在时间上的连续性，选择分离间隔为1，即从第二个轨迹点开始，用同样的方法选择训练样本。S10103: Construct training samples Construction method: Trajectory prediction is a supervised learning problem, which requires decomposing trajectory data into training samples and labels. Starting from the first trajectory point in the data set, in chronological order, select the aircraft state information corresponding to the time of the first 20 trajectory points to predict the state information of the next trajectory point, where the state information of each time step is used as the input of the corresponding cell of the neural network. Then, in order to ensure the continuity of the samples in time, select the separation interval as 1, that is, starting from the second trajectory point, select the training samples in the same way.

S102：通过输入数据获得环境和系统间的耦合信息，搭建LSTM网络进行状态预测，使用反向传播算法更新预测网络权值，得到系统的预测状态模型，实现每一时间步的系统状态轨迹的实时预测。S102: Obtain the coupling information between the environment and the system through input data, build an LSTM network for state prediction, use the back propagation algorithm to update the prediction network weights, obtain the system's predicted state model, and realize real-time prediction of the system state trajectory at each time step.

在得到数据集之后，本发明依据LSTM算法利用深度学习的机理学习映射关系，考虑飞行器状态信息的时序性，设计基于时间的反向传播算法，根据经验及离线实验确定网络训练过程中样本迭代次数、学习率等超参数，更新网络权值；最后，在实际在线预测过程中，采用训练好的飞行器状态预测网络，实现飞行器状态的实时在线预测，预测过程如图2-3所示。After obtaining the data set, the present invention uses the mechanism of deep learning to learn the mapping relationship based on the LSTM algorithm, considers the temporal nature of the aircraft state information, designs a time-based back propagation algorithm, determines the number of sample iterations, learning rate and other hyperparameters in the network training process based on experience and offline experiments, and updates the network weights; finally, in the actual online prediction process, the trained aircraft state prediction network is used to realize real-time online prediction of the aircraft state. The prediction process is shown in Figure 2-3.

首先，将包含飞行器状态和误差信息的数据输入到嵌入层提取误差信息，得到然后将飞行器的历史状态信息数据经数据处理层提取特征后与误差信息合并，合并后的数据作为LSTM层的输入，其输出即为飞行器的预测轨迹。First, the data containing the aircraft status and error information is input into the embedding layer to extract the error information, and the Then the historical status information of the aircraft is extracted through the data processing layer and combined with the error information The merged data is used as the input of the LSTM layer, and its output is the predicted trajectory of the aircraft.

在此，S102包括以下步骤：Here, S102 includes the following steps:

S10201：嵌入层：将飞行器的状态误差信息作为输入，通过嵌入函数φ(·)将输入数据映射到新的空间。它的数学表达式为：S10201: Embedding layer: embedding the state error information of the aircraft As input, the input data is mapped to the new space through the embedding function φ(·). Its mathematical expression is:

式中，是飞行器在时间步t的预测状态误差信息。是在时间步t处飞行器的状态误差信息的空间表示，W₁是嵌入函数的权重。其中，嵌入函数φ(·)的定义如下：In the formula, is the predicted state error information of the aircraft at time step t. is the spatial representation of the state error information of the aircraft at time step t, and W ₁ is the weight of the embedding function. The embedding function φ(·) is defined as follows:

S10202：数据处理层：飞行器的历史状态信息p^t作为输入，将误差信息与飞行器历史的状态信息融合，如下式：S10202: Data processing layer: The historical state information of the aircraft p ^t is taken as input, and the error information Fusion with the aircraft's historical state information is as follows:

式中，⊙表示矩阵的哈德曼积，W₂是嵌入函数的权重。是飞行器在时间步t的预测状态信息。因此，融合信息q^t既取决于飞行器的历史状态信息，也取决于飞行器状态距离终点的误差信息。Where ⊙ represents the Hadamard product of the matrix and W ₂ is the weight of the embedding function. is the predicted state information of the aircraft at time step t. Therefore, the fusion information ^qt depends not only on the historical state information of the aircraft, but also on the error information of the aircraft state from the end point.

S10203：LSTM层：利用LSTM网络根据观测的历史状态信息和融合信息q^t预测飞行器的未来轨迹，如下式所示：S10203: LSTM layer: The LSTM network is used to predict the future trajectory of the aircraft based on the observed historical state information and fusion information q ^t , as shown in the following formula:

式中，dp^t表示在时刻t生成的飞行器状态的导数；W₃是LSTM的网络权值；W₄是嵌入函数的权重。o^t和h^t-1分别是LSTM的输出和隐藏状态，LSTM的输入为[q^t,dp^t-1]。这里dp^t-1体现了数据信息的微分量，说明网络在预测过程中考虑了数据变化动态。LSTM网络结构如图3所示：In the formula, dp ^t represents the derivative of the aircraft state generated at time t; W ₃ is the network weight of LSTM; W ₄ is the weight of the embedding function. ^{o t} and h ^t-1 are the output and hidden state of LSTM respectively, and the input of LSTM is [q ^t , dp ^t-1 ]. Here, dp ^t-1 reflects the differential amount of data information, indicating that the network considers the dynamics of data changes in the prediction process. The LSTM network structure is shown in Figure 3:

可以看出，LSTM网络在t时刻的输入不仅包含输入数据x^t，还包含来自前一时刻的隐藏状态h^t-1，x^t和h^t-1在遗忘门通过激活函数获得丢弃前一层隐藏细胞状态C^t-1的概率，与C^t-1的乘积成为组成t时刻细胞状态C^t的一部分；x^t和h^t-1在输入门通过激活函数确定t时刻细胞状态C^t需要更新的信息，此信息成为组成C^t的另一部分。因此，t时刻细胞状态C^t包含过去时刻的有用信息和当前时刻的新信息。最后，基于x^t和h^t-1以及细胞状态在输出门确定输出结果。因此，t时刻LSTM的输出考虑了过去时刻的有用信息和当前时刻的状态，这样的网络结构有利于学习时间序列数据之间的关系，提升了对时间序列预测问题的处理能力。It can be seen that the input of the LSTM network at time t not only includes the input data x ^t , but also includes the hidden state h ^t-1 from the previous moment. x ^t and h ^t-1 obtain the probability of discarding the hidden cell state C ^t-1 of the previous layer through the activation function in the forget gate, and the product with C ^t-1 becomes part of the cell state C ^t at time t; x ^t and h ^t-1 determine the information that needs to be updated for the cell state C ^t at time t through the activation function in the input gate, and this information becomes another part of C ^t . Therefore, the cell state C ^t at time t contains useful information from the past moment and new information at the current moment. Finally, the output result is determined at the output gate based on x ^t , h ^t-1 and the cell state. Therefore, the output of the LSTM at time t takes into account the useful information of the past moment and the state of the current moment. Such a network structure is conducive to learning the relationship between time series data and improving the ability to handle time series prediction problems.

S10204：训练网络：预测模型首先在样本数据构造完成后，对训练样本和预测样本进行归一化处理。再把训练样本输入到网络模型中训练网络，根据损失大小调整网络结构。最后利用测试样本测试网络性能，得出预测结果。S10204: Training network: After the sample data is constructed, the prediction model first normalizes the training samples and prediction samples. Then the training samples are input into the network model to train the network, and the network structure is adjusted according to the loss size. Finally, the test sample is used to test the network performance and obtain the prediction result.

网络训练过程采用均方差函数作为损失函数，即：The network training process uses the mean square error function as the loss function, that is:

式中，n表示每一次训练过程批量样本的个数，是LSTM模型预测的飞行器的未来状态，Y_k是其真实的未来状态。深度神经网络的优化目标是让MSE趋近于0。深度神经网络训练的过程即为网络权值更新的过程。根据上述损失函数，使用反向传播算法对网络的权值进行更新，最终通过权值更新获得预测结果。In the formula, n represents the number of batch samples in each training process. is the future state of the aircraft predicted by the LSTM model, and Y _k is its actual future state. The optimization goal of the deep neural network is to make the MSE approach 0. The process of deep neural network training is the process of updating the network weights. According to the above loss function, the back propagation algorithm is used to update the network weights, and finally the prediction result is obtained through the weight update.

最终各个状态的预测结果信息表示为(1)地心距信息r_ref，(2)经度信息θ_ref，(3)纬度信息(4)速度信息θ_ref，(5)航迹角信息γ_ref，(6)航向角信息χ_ref，(7)倾侧角信息σ_ref。The prediction result information of each state is expressed as (1) geocentric distance information r _ref , (2) longitude information θ _ref , (3) latitude information (4) velocity information θ _ref , (5) track angle information γ _ref , (6) heading angle information χ _ref , (7) roll angle information σ _ref .

S2中建立RLV再入段误差模型并转化约束控制问题包括以下步骤：Establishing the RLV reentry error model and transforming the constraint control problem in S2 includes the following steps:

建立RLV再入段误差模型，设计出一种同时反应编队误差，控制量和避碰作用的性能指标函数，通过安全障碍函数将场景下的避碰问题转换为约束问题，从而实现将避碰控制问题转换为误差系统的最优稳定控制问题，保证系统的安全性。An RLV reentry error model is established, and a performance indicator function that simultaneously reflects the formation error, control quantity and collision avoidance effect is designed. The collision avoidance problem in the scenario is converted into a constraint problem through the safety obstacle function, thereby converting the collision avoidance control problem into the optimal stable control problem of the error system to ensure the safety of the system.

在RLV再入段，假设飞行器为无动力飞行的质点，考虑地球为旋转椭球时，忽略再入过程中侧力以及地球自转的影响，并取侧滑角为零。则RLV再入段动力学系统为：In the RLV reentry phase, it is assumed that the aircraft is a point mass flying without power, and when the earth is considered to be a rotating ellipsoid, the side force and the influence of the earth's rotation during the reentry process are ignored, and the sideslip angle is taken as zero. Then the RLV reentry phase dynamic system is:

式中，r,θ,v,γ,χ分别代表地心距、经度、纬度、飞行速度、航迹角和航向角；σ为倾侧角；m为飞行器质量；g为重力加速度，有g＝μ_g/r²，其中μ_g为引力参数；L为升力，有L＝q_dSC_L；D为阻力，D＝q_dSC_D，其中S为RLV气动参考面积，q_d为动压，q_d＝0.5ρv²；ρ为大气密度，有其中ρ₀为海平面处的大气密度，R_e为地球半径，β为常值系数；升力系数C_L和阻力系数C_D表示为攻角α的函数，将在后续仿真中给出。In the formula, r,θ, v, γ, χ represent the distance from the center of the earth, longitude, latitude, flight speed, track angle and heading angle respectively; σ is the roll angle; m is the mass of the aircraft; g is the acceleration of gravity, g = μ _g /r ² , where μ _g is the gravity parameter; L is the lift, L = q _d SC _L ; D is the drag, D = q _d SC _D , where S is the RLV aerodynamic reference area, q _d is the dynamic pressure, q _d = 0.5ρv ² ; ρ is the atmospheric density, Where _ρ0 is the atmospheric density at sea level, _Re is the radius of the earth, and β is a constant coefficient; the lift coefficient _CL and the drag coefficient _CD are expressed as functions of the angle of attack α, which will be given in subsequent simulations.

若仅针对现有控制量进行处理，会引入高频抖振并对问题的收敛性产生影响，因此需要引入新的控制变量。引入新的辅助控制变量从而实现控制量从状态量中解耦，令新的控制量为：If only the existing control quantity is processed, high-frequency chattering will be introduced and the convergence of the problem will be affected, so a new control variable needs to be introduced. The new auxiliary control variable is introduced to decouple the control quantity from the state quantity, and the new control quantity is:

则RLV再入段动力学模型可改写为：Then the RLV reentry phase dynamics model can be rewritten as:

式中，为新的状态量。状态矩阵f(x')和控制矩阵B分别为：In the formula, is the new state quantity. The state matrix f(x') and control matrix B are:

B＝[0,0,0,0,0,0,1]^T B＝[0,0,0,0,0,0,1] ^T

定义S1最后LSTM预测模型给出的下一步轨迹预测状态量为则飞行器状态和参考输入的误差系统如下:Define the next step trajectory prediction state quantity given by the last LSTM prediction model of S1 as Then the error system of the aircraft state and reference input is as follows:

式中，e代表飞行器当前状态和目标的误差。Where e represents the error between the current state of the aircraft and the target.

综合式，式和式，可得RLV再入段误差系统：Combining equations , , and , we can get the RLV reentry error system:

为了保证飞行器的安全稳定飞行，需要保证飞行器的状态和控制量满足约束条件，保证系统的安全性。基于此，本步将设计一个基于安全障碍函数的约束项，为后续性能指标函数中的约束需求提供基础。In order to ensure the safe and stable flight of the aircraft, it is necessary to ensure that the state and control quantity of the aircraft meet the constraints and ensure the safety of the system. Based on this, this step will design a constraint item based on the safety barrier function to provide a basis for the constraint requirements in the subsequent performance indicator function.

为了保证飞行器的安全稳定飞行，RLV的状态需要严格满足一些约束条件：In order to ensure the safe and stable flight of the aircraft, the state of the RLV needs to strictly meet some constraints:

1).定义控制算法中飞行器状态量x，满足起点状态条件x₀和终点状态条件x_f。1). Define the aircraft state x in the control algorithm, satisfying the starting state condition x ₀ and the end state condition x _f .

2).受飞行器性能影响，在再入过程中，定义控制量u满足约束u_min≤u≤u_max，状态量x满足约束x_min≤x≤x_max，其中u_min表示控制量上界，u_max表示控制量下界；其中x_min表示状态量上界，x_max表示状态量下界。2). Affected by the performance of the aircraft, during the reentry process, the control quantity u is defined to satisfy the constraint u _min ≤u≤u _max , and the state quantity x satisfies the constraint x _min ≤x≤x _max , where u _min represents the upper bound of the control quantity and u _max represents the lower bound of the control quantity; where x _min represents the upper bound of the state quantity and x _max represents the lower bound of the state quantity.

针对飞行器的状态约束条件，飞行器状态安全域和状态障碍函数可以设计为：According to the state constraints of the aircraft, the aircraft state safety domain and state obstacle function can be designed as follows:

安全域D_x：Security domain _Dx :

D_x＝{x∈Rⁿ∣x_min≤x≤x_max} (14)D _x ＝{x∈R ⁿ ∣x _min ≤x≤x _max } (14)

障碍函数μ_x：Barrier function μ _x :

式中，当某一时刻飞行器的状态量在安全域D_x之外时，判断不满足约束条件，η为一个正实常数，满足η＞1。In the formula, when the state of the aircraft is outside the safety domain _Dx at a certain moment, it is judged that the constraint condition is not met, and η is a positive real constant that satisfies η＞1.

可以看出，μ_x是一个安全障碍函数，通过函数μ_x设计相关的性能指标函数，当某一时刻飞行器的状态量靠近安全域D_x的边界时，性能指标函数将会接近无穷大，则下一时刻控制器会朝着最小化性能指标函数的方向改变，所以的μ_x存在保证了系统状态的安全性。It can be seen that _μx is a safety barrier function. The relevant performance index function is designed through the function _μx . When the state of the aircraft is close to the boundary of the safety domain _Dx at a certain moment, the performance index function will approach infinity. At the next moment, the controller will change in the direction of minimizing the performance index function. Therefore, the existence of _μx ensures the safety of the system state.

同理，针对飞行器的控制量约束条件，也开可以设计飞行器控制量的安全域和控制量障碍函数可以设计为：Similarly, according to the control quantity constraints of the aircraft, the safety domain of the aircraft control quantity and the control quantity obstacle function can be designed as follows:

安全域D_u：Security domain _Du :

D_u＝{u∈Rⁿ∣u_min≤u≤u_max} (16)D _u ＝{u∈R ⁿ ∣u _min ≤u≤u _max } (16)

障碍函数μ_u：Barrier function μ _u :

接下来，定义系统的性能指标函数J为：Next, define the system's performance indicator function J as:

J＝∫_t ^∞U(e,u)dt (18)J＝∫ _t ^∞ U(e,u)dt (18)

式中，瞬时性能指标U(e,u)定义为：In the formula, the instantaneous performance index U(e,u) is defined as:

U(e,u)＝e^TQe+u^TRu+μ_xe^Te+μ_uu^Tu (19)U(e,u)＝e ^T Qe+u ^T Ru+μ _x e ^T e+μ _u u ^T u (19)

式中，Q,R是正定的斜对称矩阵。Where Q and R are positive definite skew-symmetric matrices.

可以看出，性能指标函数由四部分组成：第一项e^TQe代表飞行器状态误差代价，第二项u^TRu代表飞行器控制量代价，第三项μ_xe^Te代表飞行器的状态约束，第三项μ_uu^Tu代表飞行器的控制量约束。It can be seen that the performance index function consists of four parts: the first term e ^T Qe represents the aircraft state error cost, the second term u ^T Ru represents the aircraft control quantity cost, the third term μ _x e ^T e represents the aircraft state constraint, and the third term μ _u u ^T u represents the aircraft control quantity constraint.

则最优性能指标函数J^*可以表示为：Then the optimal performance index function J ^* can be expressed as:

为了实现RLV再入段误差系统的平稳飞行和约束条件，控制的目标是找到一组能够最小化性能指标函数并且使系统状态限制在安全域D_x，控制量限制在安全域D_u的控制策略u，则以上控制问题可以写为：In order to achieve the smooth flight and constraint conditions of the RLV reentry error system, the control goal is to find a set of control strategies u that can minimize the performance index function and limit the system state to the safety domain D _x and the control quantity to the safety domain _Du . The above control problem can be written as:

S3：设计基于自适应动态规划的飞行器跟踪控制器；S3: Design an adaptive dynamic programming based vehicle tracking controller;

S3中设计基于自适应动态规划的飞行器跟踪控制器包括以下步骤：Designing an adaptive dynamic programming-based vehicle tracking controller in S3 includes the following steps:

设计基于自适应动态规划的控制算法，构建评判网络去近似最优性能指标函数并求解最优控制策略，采用策略梯度法，对神经网络所有权值的范数进行更新，利用网络输出迭代，最终获得最优控制策略。A control algorithm based on adaptive dynamic programming is designed, and an evaluation network is constructed to approximate the optimal performance index function and solve the optimal control strategy. The policy gradient method is used to update the norm of all valued values of the neural network, and the network output is iterated to finally obtain the optimal control strategy.

因为设计的性能指标函数连续可微时，可以得到如下的Lyapunov方程为：Because the designed performance index function is continuously differentiable, the following Lyapunov equation can be obtained:

式中表示性能指标函数J关于误差系统e的偏导，且有J(0)＝0。可定义问题的哈密顿方程为：In the formula It represents the partial derivative of the performance index function J with respect to the error system e, and J(0) = 0. The Hamiltonian equation of the problem can be defined as:

当性能指标函数为最优时，哈密顿方程将变成哈密顿-雅可比-贝尔曼方程，即When the performance index function is optimal, the Hamiltonian equation will become the Hamilton-Jacobi-Bellman equation, that is,

最优的性能指标函数J^*通过求解上式的哈密顿-雅可比-贝尔曼方程获得，并且其有唯一解，当解J^*存在且连续可微时，最优控制策略可通过求解得到，即令性能指标函数最小的飞行器控制策略为The optimal performance index function J ^* is obtained by solving the Hamilton-Jacobi-Bellman equation above, and it has a unique solution. When the solution J ^* exists and is continuously differentiable, the optimal control strategy can be solved It is obtained that the aircraft control strategy that minimizes the performance index function is

为了处理哈密顿-雅可比-贝尔曼方程在实际应用过程中难以求解的问题，本发明利用单层评判神经网络逼近的原理来近似最优性能指标函数J^*的值：In order to solve the problem that the Hamilton-Jacobi-Bellman equation is difficult to solve in practical applications, the present invention uses the principle of single-layer judgment neural network approximation to approximate the value of the optimal performance indicator function J ^* :

J^*＝W_c ^Tσ_c(e)+ε_c (26)J ^* = W _c ^T σ _c (e) + ε _c (26)

式中，W_c∈Rⁿ是理想的评判神经网络权值向量，σ_c(e)∈Rⁿ表示评判神经网络的激活函数，ε_c∈R代表评判神经网络的近似误差。对其求偏导可得：In the formula, W _c ∈R ⁿ is the ideal weight vector of the judgment neural network, σ _c (e) ∈R ⁿ represents the activation function of the judgment neural network, and ε _c ∈R represents the approximate error of the judgment neural network. Taking the partial derivative, we can get:

式中，和分别表示激活函数和近似误差的偏导。将上式代入中，可得哈密顿方程为：In the formula, and Denote the partial derivatives of the activation function and the approximate error respectively. Substituting the above formula into, we can get the Hamiltonian equation:

式中，e_cH是评判神经网络近似产生的残余误差，在自适应动态规划的框架中，考虑到理想权重未知的事实，通常根据估计的权重向量建立评判神经网络来逼近最优性能指标函数，即有：Where e _cH is the residual error generated by the approximation of the neural network. In the framework of adaptive dynamic programming, considering the fact that the ideal weights are unknown, it is usually based on the estimated weight vector Establish a judgment neural network to approximate the optimal performance index function, that is:

因此，近似哈密顿量可以如下所示：Therefore, the approximate Hamiltonian can be written as:

定义权值估计误差此时有：definition Weight estimation error At this time there are:

为了调整临界评判神经网络权重向量定义误差函数利用策略梯度法使得误差函数最小，因此评判神经网络权值调整规则设置为：In order to adjust the critical judgment neural network weight vector Define the error function The policy gradient method is used to minimize the error function, so the weight adjustment rule of the judgment neural network is set to:

式中，α_c＞0是评判神经网络的学习速率。因此，理想的控制策略可以描述为：Where α _c > 0 is the learning rate of the neural network. Therefore, the ideal control strategy can be described as:

其近似值表示为：Its approximate value is expressed as:

基于以上三步，就完成了整个基于LSTM和自适应动态规划的空天往返飞行器闭环跟踪智能组合制导的过程。Based on the above three steps, the entire process of closed-loop tracking and intelligent combined guidance of aerospace round-trip vehicles based on LSTM and adaptive dynamic programming is completed.

本技术方案采用预测矫正框架，基于LSTM方法进行预测和自适应动态规划结合设计控制器，解决了传统动态规划控制算法的“维数灾”问题，通过学习对制导律不断迭代更新，最终在有限时域内确保选定的性能指标函数达到最优，得到最优反馈制导律，提高了控制器的自主性。This technical solution adopts a predictive correction framework, combines prediction based on the LSTM method with adaptive dynamic programming to design a controller, solves the "curse of dimensionality" problem of traditional dynamic programming control algorithms, and continuously iterates and updates the guidance law through learning. Ultimately, it ensures that the selected performance indicator function reaches the optimal level within a limited time domain, obtains the optimal feedback guidance law, and improves the autonomy of the controller.

实施例2：Embodiment 2:

本实施例中为了验证本发明提出的算法有效性，将算法在MATLAB/Simulink中进行集成设计，并进行了仿真实验，主要仿真过程如下：In order to verify the effectiveness of the algorithm proposed in this embodiment, the algorithm is integrated and designed in MATLAB/Simulink, and a simulation experiment is performed. The main simulation process is as follows:

LSTM预测模型和网络训练参数设置：LSTM prediction model and network training parameter settings:

轨迹预测模型的配置如图9所示。其中，嵌入层网络的输入是飞行器的地心距信息r，经度信息θ，纬度信息速度信息θ，航迹角信息γ，航向角信息χ，倾侧角信息σ，以及和终态对应误差形成的14维信息，预测网络的输出是飞行器的未来的位置信息。The configuration of the trajectory prediction model is shown in Figure 9. The input of the embedding layer network is the aircraft's geocentric distance information r, longitude information θ, and latitude information The output of the prediction network is the future position information of the aircraft, including the velocity information θ, the track angle information γ, the heading angle information χ, the roll angle information σ, and the 14-dimensional information formed by the error corresponding to the final state.

实验先离线用大量轨迹数据对预测网络进行训练，再在实际在线预测过程中，用训练好的模型预测轨迹，这样，所提出的预测模型需要更短的预测时间，保障了预测的实时性。实验使用历史的20个时间步长信息来预测未来信息，即T_obs＝20，每次使用飞行器的整体状态信息的20*14向量作为输入，飞行器未来状态信息的T_pred*2*3维向量是输出。通过这种方式，利用飞行器轨迹数据形成42970*20*14个训练样本量n。LSTM模型随机选择80％的数据集进行训练，保留剩余20％的数据集用于测试，然后使用15％的训练集作为验证集，在训练期间验证和微调网络。算法采用单步轨迹预测进行了仿真实验，即T_pred＝1，反向传播过程中学习率统一设置为0.01，为了防止控制波动做了平滑插值处理。The experiment first trained the prediction network offline with a large amount of trajectory data, and then used the trained model to predict the trajectory in the actual online prediction process. In this way, the proposed prediction model requires a shorter prediction time, ensuring the real-time prediction. The experiment uses 20 historical time step information to predict future information, that is, T _obs = 20, and each time uses the 20*14 vector of the overall state information of the aircraft as input, and the T _pred *2*3 dimensional vector of the future state information of the aircraft is the output. In this way, 42970*20*14 training samples n are formed using the aircraft trajectory data. The LSTM model randomly selects 80% of the data set for training, retains the remaining 20% of the data set for testing, and then uses 15% of the training set as the validation set to verify and fine-tune the network during training. The algorithm uses single-step trajectory prediction for simulation experiments, that is, T _pred = 1, and the learning rate is uniformly set to 0.01 during back propagation. In order to prevent control fluctuations, smooth interpolation processing is performed.

自适应动态规划跟踪控制器参数设置：Adaptive dynamic programming tracking controller parameter settings:

RLV基础参数设置如下所示：飞行器质量m＝104.305kg；引力参数μ_g＝3.986×10¹⁴m³/s²；飞行器面积S＝391.22m²；地球半径R_e＝6.37×10³m，海平面大气密度ρ₀＝0.00238kg/m³；升力系数C_L＝-0.207+2.04α,阻力系数C_D＝0.0785-0.3529α。系统常值系数β＝1.3875×10^-4，k_Q＝9.44×10^-5。飞行器，制量约束飞行器初始条件和终点约束如图10所示，飞行器状态量约束：The basic parameters of RLV are as follows: vehicle mass m = 104.305 kg; gravity parameter μ _g = 3.986 × 10 ¹⁴ m ³ / s ² ; vehicle area S = 391.22 m ² ; earth radius _Re = 6.37 × 10 ³ m, sea level atmospheric density ρ ₀ = 0.00238 kg/m ³ ; lift coefficient C _L = -0.207 + 2.04α, drag coefficient C _D = 0.0785 - 0.3529α. System constant coefficient β = 1.3875 × 10 ^-4 , k _Q = 9.44 × 10 ^-5 . Vehicle, quantity constraints The initial conditions and terminal constraints of the aircraft are shown in Figure 10. The state constraints of the aircraft are:

神经网络激活函数为评判神经网络权值初值为W_c＝[90,80,30,20,40,75,60,15]^T，Q＝R＝I,η＝1.5,神经，络的学习率为α_c＝0.5。The neural network activation function is The initial value of the neural network weights is W _c = [90, 80, 30, 20, 40, 75, 60, 15] ^T , Q = R = I, η = 1.5, and the learning rate of the neural network is α _c = 0.5.

其中，图4展示，飞行器各个状态的预测结果随时间步变化曲线，从图4可以看出，预测模型针对飞行器的高度，速度，维度，经度，航迹角，航向角和倾侧角均实现了精确的状态预测，因此所设计的基于LSTM状态预测模型可以实现飞行器状态的准确观测。Among them, Figure 4 shows the prediction results of various states of the aircraft changing with time steps. It can be seen from Figure 4 that the prediction model has achieved accurate state prediction for the aircraft's altitude, speed, latitude, longitude, track angle, heading angle and roll angle. Therefore, the designed LSTM-based state prediction model can realize accurate observation of the aircraft's state.

其中，图5显示了飞行器状态预测和真实状态之间的绝对百分比误差变化曲线，从图5可以看出，在不同状态下预测的模型误差均在6％以下，验证了预测模型的准确性。Among them, Figure 5 shows the absolute percentage error change curve between the aircraft state prediction and the actual state. It can be seen from Figure 5 that the model errors predicted under different states are all below 6%, which verifies the accuracy of the prediction model.

其中，如图6所示，显示的是飞行器在高度，速度，维度，经度，航迹角，航向角和倾侧角的状态变化曲线，由图可知，状态跟踪基本在500s左右形成，然后保持稳定的给跟踪飞行，验证了设计的基于自适应动态规划的跟踪控制算法的有效性和稳定性。Among them, as shown in Figure 6, it shows the state change curves of the aircraft in terms of altitude, speed, latitude, longitude, track angle, heading angle and roll angle. It can be seen from the figure that the state tracking is basically formed in about 500s, and then maintains a stable tracking flight, which verifies the effectiveness and stability of the designed tracking control algorithm based on adaptive dynamic programming.

其中，图7显示了飞行器在高度，速度，维度，经度，航迹角，航向角和倾侧角的状态误差的变化曲线，由图可知，在500s左右之后，飞行器的状态误差逐渐收敛到0，进一步验证了设计的基于自适应动态规划的跟踪控制算法的有效性。Among them, Figure 7 shows the change curves of the state errors of the aircraft in altitude, speed, latitude, longitude, track angle, heading angle and roll angle. It can be seen from the figure that after about 500s, the state error of the aircraft gradually converges to 0, which further verifies the effectiveness of the designed tracking control algorithm based on adaptive dynamic programming.

其中，图8表示飞行器评判神经网络权值参数的变化过程，由图8可知，随着时间变化，在有限时间内，评价神经网络权值参数是稳定收敛的，逼近于相应的最优值。Among them, Figure 8 shows the change process of the weight parameters of the aircraft evaluation neural network. It can be seen from Figure 8 that with the change of time, within a limited time, the weight parameters of the evaluation neural network are stably converged and approach the corresponding optimal value.

以上仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以作出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。本发明中未具体描述和解释说明的结构、装置以及操作方法，如无特别说明和限定，均按照本领域的常规手段进行实施。The above are only preferred embodiments of the present invention. It should be noted that, for those skilled in the art, several improvements and modifications can be made without departing from the principles of the present invention, and these improvements and modifications should also be considered as the protection scope of the present invention. The structures, devices and operating methods not specifically described and explained in the present invention shall be implemented according to conventional means in the art unless otherwise specified and limited.

Claims

1. A closed-loop tracking intelligent combined guidance method for an air-to-space round-trip aircraft, characterized in that it comprises the following steps: S1: designing an LSTM aircraft trajectory prediction algorithm;

The S1 comprises the following steps:

S101: Data information fusion and feature extraction are performed by combining the system state data and the error data from the end point at each time step to build an information data set and database for the RLV trajectory prediction problem;

S102: Obtain the coupling information between the environment and the system through input data, build an LSTM network for state prediction, use the back propagation algorithm to update the prediction network weights, obtain the system's predicted state model, and realize real-time prediction of the system state trajectory at each time step;

S103: During the flight, the obtained prediction model is used to continuously predict the flight endpoint, and the deviation between the predicted landing point and the expected endpoint is used as a control error and input to the controller to adjust the control amount;

S2: Establish the RLV reentry error model and transform the constraint control problem;

S3: Design an adaptive dynamic programming based vehicle tracking controller;

The S3 comprises the following steps:

A control algorithm based on adaptive dynamic programming is designed, and an evaluation network is constructed to approximate the optimal performance index function and solve the optimal control strategy. The policy gradient method is used to update the norm of all valued values of the neural network, and the network output is iterated to finally obtain the optimal control strategy.

2. The closed-loop tracking intelligent combined guidance method for an air-to-space round-trip aircraft according to claim 1, characterized in that: the method for establishing the information data set in S101 comprises the following steps:

S10101: Obtain information, relying on sensors and radars to obtain various status information of the aircraft that changes over time;

S10102: Data preprocessing: using zero mean standardization preprocessing method to preprocess the data;

S10103: Construct training samples and decompose trajectory data into training samples and labels.

3. The closed-loop tracking intelligent combined guidance method for an air-to-space shuttle according to claim 2, characterized in that: the construction of training samples comprises the following steps:

Starting from the first trajectory point in the data set, in chronological order, select the aircraft state information corresponding to the time of the first 20 trajectory points to predict the state information of the next trajectory point, where the state information of each time step is used as the input of the corresponding cell of the neural network, and the separation interval is selected as 1. Starting from the second trajectory point, the training samples are selected in the same way.

4. The closed-loop tracking intelligent combined guidance method for an air-to-space shuttle according to claim 1, characterized in that: said S102 comprises the following steps:

S10201: taking the state error information of the aircraft as input, and mapping the input data to a new space through an embedding function;

S10202: Taking the historical state information of the aircraft as input, fusing the error information with the historical state information of the aircraft;

S10203: Use LSTM network to predict the future trajectory of the aircraft based on the observed historical state information and fusion information;

S10204: Construct a prediction model through sample data, normalize the training samples and prediction samples, input the training samples into the network model to train the network, adjust the network structure according to the loss size, use the test samples to test the network performance, and obtain the prediction results.

5. According to claim 1, a closed-loop tracking intelligent combined guidance method for an air-to-space round-trip aircraft is characterized in that: the information data set is 14-dimensional, and the information data set includes state characteristic information of the aircraft and state difference information between the current state and the terminal state of the aircraft, the state characteristic information of the aircraft includes geocentric distance information, longitude information, latitude information, speed information, track angle information, heading angle information and roll angle information, and the state difference information between the current state and the terminal state of the aircraft includes geocentric distance information difference, longitude information difference, latitude information difference, speed information difference, track angle information difference, heading angle information difference and roll angle information difference.

6. The closed-loop tracking intelligent combined guidance method for an air-to-space shuttle according to claim 1, characterized in that: S2 comprises the following steps:

An RLV reentry error model is established, and a performance index function that simultaneously reflects the formation error, control quantity and collision avoidance effect is designed. The collision avoidance problem in the scenario is converted into a constraint problem through the safety obstacle function, and the collision avoidance control problem is converted into a stable control problem of the error system.

7. According to claim 1, a closed-loop tracking intelligent combined guidance method for a space shuttle vehicle is characterized in that: the state of the RLV needs to strictly meet the following constraints: 1) Define the state of the aircraft in the control algorithm Satisfy the starting state conditions and the end state condition 2). Affected by the performance of the aircraft, during the reentry process, the control amount is defined Satisfy constraints State quantity Satisfy constraints in represents the upper bound of the control quantity, represents the lower bound of the control quantity; represents the upper bound of the state quantity, Indicates the lower bound of the state quantity.