CN116185061A

CN116185061A - Middle section guidance method based on integrated transfer learning

Info

Publication number: CN116185061A
Application number: CN202211516761.3A
Authority: CN
Inventors: 何绍溟; 金天宇; 王江; 李虹言; 刘子超
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-11-28
Filing date: 2022-11-28
Publication date: 2023-05-30

Abstract

The invention discloses a guidance method (ETLS) based on integrated transfer learning, which not only can generate an optimal guidance command in real time, but also can quickly adapt to a new working environment by fine adjustment with little new data after the scene changes, and has almost the same good performance as before; according to the method, a plurality of traditional trained DNN neural networks are combined with a meta-learner, the problem of optimal control of middle guidance of a new aircraft is simplified to the problem of searching an optimal weighting function and an optimal bias function, the two functions can be rapidly determined by using a small amount of data, the problems of time consumption and insufficient data of retraining a new network are avoided, and therefore, a middle guidance section control instruction meeting the requirements of final speed and precision can be given in a very short time aiming at the new aircraft and a new application scene.

Description

Middle section guidance method based on integrated transfer learning

Technical Field

The invention relates to an aircraft middle section guidance method, in particular to a middle section guidance method based on integrated transfer learning.

Background

During the mission of the aircraft, the aircraft undergoes three stages of launch, mid-stage guidance and end-stage guidance. The middle-stage guidance process takes the longest time and is also the most critical step in the aircraft guidance process.

The guidance system is the core of the missile with high hit rate, and the hit precision of the missile can be directly influenced by the quality of a guidance algorithm. The widely used algorithm at present is a mature analysis method, a numerical method and the like, is called a traditional guidance algorithm, and can ensure higher guidance precision in a foreseeable range. In recent years, students introduce machine learning methods into the guidance field, so that a series of emerging data-based guidance algorithms, typically deep learning and reinforcement learning guidance algorithms, are created. These methods are characterized by the need for large amounts of data and time to train Deep Neural Networks (DNNs). Once trained, the deep neural network can quickly generate results with less computational cost.

However, one inherent disadvantage of this approach is the poor generalization ability. Well-trained deep neural networks often fail to provide satisfactory performance in entirely new mission scenarios, and even fail to function properly in most cases. This means that when the application scene changes, a new DNN needs to be retrained. However, since the training process is time consuming and requires a large amount of marking data, data-based guidance algorithms are difficult to apply for tasks that can only provide small amounts of data or are time-critical.

Based on this, when a new aircraft is designed or a mature aircraft is applied to a new application scene, the control system of the aircraft in the middle guidance section often cannot meet the guidance requirement due to lack of enough data flushing, the aircraft cannot obtain the maximum final speed in the middle guidance section, and the final hit accuracy is also affected.

Based on the above problems, the present inventors have conducted an in-depth analysis on a guidance method of a data-based middle guidance section, and have expected to design a guidance method based on integrated transfer learning (ETLS) capable of solving the above problems.

Disclosure of Invention

In order to overcome the problems, the inventor has conducted intensive research and designs a guidance method (ETLS) based on integrated transfer learning, which not only can generate an optimal guidance command in real time, but also can adapt to a new working environment quickly by fine tuning with little new data after the scene changes, and has almost the same good performance as before; according to the method, a plurality of traditional trained DNN neural networks are combined with a meta-learner, the problem of optimal control of middle guidance of a new aircraft is simplified to the problem of searching an optimal weighting function and an optimal bias function, the two functions can be rapidly determined by using a small amount of data, the problems of time consumption and insufficient data of retraining a new network are avoided, and therefore, a middle guidance section control instruction meeting the requirements of final speed and precision can be given in a very short time aiming at the new aircraft and a new application scene, and the method is completed.

Specifically, the present invention aims to provide a middle guidance method based on integrated transfer learning, which is characterized in that in the method, an optimal control instruction a is obtained in real time in a middle guidance section _c ^new The method comprises the steps of carrying out a first treatment on the surface of the By the optimal control instruction a _c ^new And controlling steering engine rudder operation of the aircraft to enable the aircraft to fly according to a preset track, and further completing a middle-stage guidance task with maximized tail speed.

Wherein the optimal control command a is obtained in real time by inputting the state vector S of the aircraft into a pre-trained network E in real time _c ^new 。

The training process of the network E comprises the following steps:

step 1, training to obtain at least 5 DNN neural networks to form a base learner;

step 2, connecting the base learner with the element learner to obtain a network E, namely taking the output of the base learner as the input of the element learner;

and step 3, training the network E through a small amount of aircraft training data to obtain the trained network E.

In the step 1, each DNN neural network corresponds to an application scenario, that is, application scenarios targeted by each DNN neural network are different.

In the step 1, the DNN neural network is a deep feed-forward neural network, and the DNN neural network has 3 hidden layers, 20 neurons in each layer, and the neurons in each hidden layer are fully connected with the neurons in the previous layer.

Wherein in the step 1, the training process of the DNN neural network includes:

step a, normalizing and grouping training data;

step b, inputting training set data into the DNN neural network, and comparing the predicted value with a standard value in the training set to obtain a loss;

step c, error back propagation and parameter updating;

step d, after the neural network finishes one training, inputting data of a verification set and a test set into the neural network, and calculating a loss value of the network to be used as a measurement index of generalization capability of the neural network; training is stopped when the loss value decreases to a set value or a maximum epoch is reached.

The element learner is a single hidden layer feedforward neural network, and at least 5 elements are input to the element learner, namely at least 5 DNN neural network outputs are input to the element learner; the output of the meta learner is the optimal control instruction a _c ^new 。

The algorithm in the single hidden layer feedforward neural network is as follows:

wherein i represents the number input by the single hidden layer feedforward neural network;

n represents the number of single hidden layer feedforward neural network inputs;

a _ci representation sheetAn ith input of the hidden feedforward neural network;

C _j representing a weighting function;

b _j representing the bias function.

Wherein in step 3, the small amount refers to less than 500 sets of data.

The invention has the beneficial effects that:

(1) According to the middle-stage guidance method based on integrated transfer learning, a plurality of neural networks are respectively trained for different aerodynamic models, namely a base learner, and then a small feedforward neural network is utilized to learn the mapping relation from the old optimal control to the optimal control under the new environment, so that the method can be rapidly adapted to the new environment with insufficient data;

(2) According to the middle-stage guidance method based on the integrated transfer learning, training data can be greatly reduced under the condition of ensuring guidance performance;

(3) According to the middle-stage guidance method based on integrated transfer learning, the adaptation to the new environment can be completed in a few seconds, and the method is suitable for being used under the condition of strict time requirements;

(4) The middle-section guidance method based on the integrated transfer learning provided by the invention can be flexibly applied to other scenes, such as minimum control energy guidance, minimum time guidance and the like.

Drawings

Fig. 1 shows a schematic structural diagram of a DNN neural network according to a preferred embodiment of the present invention;

FIG. 2 illustrates an optimal guidance command for different aerodynamic parameters under the same initial conditions;

FIG. 3 is a schematic diagram showing the structure of a meta learner according to a preferred embodiment of the present invention;

FIG. 4 shows a schematic diagram of training loss and validation loss of the meta-learner at different neuron numbers in example 1;

FIG. 5 shows a comparative schematic of position errors in example 2;

FIG. 6 shows a comparative schematic of velocity error in example 2;

fig. 7 shows a comparative schematic diagram of the terminal angle error in embodiment 2;

FIG. 8 is a diagram showing time error comparison in example 2;

fig. 9 shows a positional error comparison diagram in embodiment 3;

fig. 10 shows a comparative schematic of speed error in example 3;

fig. 11 shows a comparative schematic diagram of the terminal angle error in embodiment 3;

FIG. 12 is a diagram showing a comparison of time errors in example 3;

fig. 13 shows a comparative schematic diagram of positional errors in embodiment 4;

fig. 14 shows a comparative schematic of speed error in example 4;

fig. 15 shows a comparative schematic diagram of the terminal angle error in embodiment 4;

FIG. 16 is a diagram showing a comparison of time errors in example 4;

FIG. 17 shows a schematic view of the trajectory of an aircraft in example 5;

FIG. 18 shows a schematic diagram of the speed change with time in example 5;

FIG. 19 is a graph showing the change of the track inclination angle with time in example 5;

fig. 20 is a diagram showing a time-dependent change of the control command in embodiment 5.

Detailed Description

The invention is further described in detail below by means of the figures and examples. The features and advantages of the present invention will become more apparent from the description.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

According to the middle-section guidance method based on the integrated transfer learning, in the method, the optimal control finger is obtained in real time in the middle-section guidanceLet a _c ^new The method comprises the steps of carrying out a first treatment on the surface of the The frequency for obtaining the optimal control instruction is not lower than the calculation speed of the neural network. The neural network used in the present application calculates time not more than 1ms, about 0.3-0.6ms, so that the optimal control command can be calculated at a frequency not lower than 1000 HZ. The frequency of obtaining the optimal control command is preferably set to 20HZ; obviously higher frequencies can achieve higher accuracy. By the optimal control instruction a _c ^new And controlling steering engine rudder operation of the aircraft to enable the aircraft to fly according to a preset track, and further completing a middle-stage guidance task with maximized tail speed. The final speed maximization is the maximization of the speed of the aircraft at the intersection point of the middle guidance section and the final guidance section when the aircraft flies according to the optimal track. In the application, the aircraft can be a novel aircraft obtained by new design, or can be an aircraft applied to a new application scene, and the optimal control instruction a can be obtained by obtaining only a small amount of flight trajectory data _c ^new To control the rudder of the steering engine of the aircraft.

Preferably, said optimal control command a is obtained in real time by inputting in real time the state vector S of the aircraft into a pre-trained network E _c ^new . The state vector S is obtained in real time by sensors onboard the aircraft. In this application, the selection of the state vector is not fixed, and appropriate parameters need to be selected for different tasks. For example, the state vector S may include

Wherein x, y represents the abscissa of the aircraft, < >>

For ballistic dip, V is aircraft speed, x _f ,y _f ,/>

Representing the terminal position coordinates and the ballistic tilt angle.

In a preferred embodiment, the training process of the network E comprises the following steps:

step 2, connecting the base learner with the element learner to obtain a network E, namely taking the output of the base learner as the input of the element learner; the migration learning module in the application consists of the meta learner;

Preferably, in the step 1, each DNN neural network corresponds to an application scenario, that is, the application scenario targeted by each DNN neural network is different. The application scene comprises: the system dynamics model, the operation environment and the task target are different in any aspect of the two application scenes, namely the two application scenes.

For example, there are five types of aircraft, referred to as aircraft 1 through pilot 5. Due to the different configurations, their aerodynamic coefficients are also different, as shown in tables 1 and 2. Wherein table 1 is the baseline for the aerodynamic coefficients, and the coefficients in table 2 are all changed on the basis of this. Aircraft 1 through 5 have sufficient flight data and thus each has a trained DNN, numbered B ₁ To B ₅ The terminal angle constraint terminal speed maximization middle-stage guidance task of the corresponding aircraft can be controlled by the base learner. At this time, if a new aircraft is put into operation, its aerodynamic coefficient is as shown in table 3, and it is difficult to train a new DNN for it due to its lack of data support; there is a need to address this problem in the event of such data scarcity by the ETLS proposed by the present application.

TABLE 1 aerodynamic coefficient baseline

Table 2 aerodynamic coefficients of aircraft 1 to 5

TABLE 3 aerodynamic coefficients of a New aircraft

Preferably, the base learner requires a method of generating the data set. The optimal control problem is highly nonlinear, so that an analytical solution does not exist and can only be solved by a numerical method; the existing numerical methods are of many kinds, but the convergence and the calculation speed are not uniform. The present application selects hp-FRPM as the solution method. The method is insensitive to initial value guess, the convergence speed is high, and an optimal solution of an optimal control problem can be calculated only by 1 second on average in an exemplary scene in the application. The algorithm can be directly invoked by using the existing solver GPOPS-II. The greatest advantage of hp-FRPM is that discrete points can be adaptively increased or decreased according to the gradient of the change of state variables and control variables. More discrete points are arranged at the positions with larger variable change gradient, and fewer discrete points are arranged at the positions with less severe change. This makes the optimal data distribution that solves more reasonable, more is favorable to neural network training fully.

The application randomly generates 15000 initial conditions, wherein the random initial emission angle and the horizontal position of the target correspondingly have 15000 optimal control problems. The solution accuracy of GPOPS-II is set to 1×10 ^-8 . The solved state-optimal control pair s, a _c ]Stored as a training set.

Preferably, in the step 1, the DNN neural network is a deep feed-forward neural network, the DNN neural network has 3 hidden layers, 20 neurons in each layer, the neurons in each hidden layer are fully connected with the neurons in the previous layer, and the structure of the DNN neural network is shown in fig. 1.

The calculation method of each layer in the neural network is as follows:

L _i+1 ＝σ(W _i L _i +b _i )

wherein W is _i As a weight matrix, b _i For biasing matrix, L _i+1 Representing the output of layer i+1, σ represents the nonlinear activation function. The nonlinear activation function is an indispensable part of a neural network, and different kinds of activation functions have different effects on training results. This patent selects to use

The function acts as an activation function because it is better suited to fit the problem.

Preferably, in the step 1, the training process of the DNN neural network includes:

step a, normalizing and grouping training data; normalization can increase the training speed and stability of the neural network. The application uses

Normalizing;

at the time of grouping, 70% of the data was used as training set, 15% as validation set, and 15% as test set.

Step b, inputting training set data into the DNN neural network, and comparing the predicted value with a standard value in the training set to obtain a loss; the present application uses mean square error as a loss function:

step c, error back propagation and parameter updating; the loss value calculated by the loss function is propagated into the neural network using a back propagation algorithm. Then each parameter W _i And b _i Updating according to the loss value. The present application uses the Levenberg-Marquardt algorithm for updating. The update formula is as follows:

x _k+1 ＝x _k (J ^T J+μI) ^-1 J ^T e

wherein J represents jacobian, μ is a radius of a confidence region, an initial value is 0.001, and μ' =0.1 μ is made when the loss of neural network training is reduced, so that convergence speed can be increased; conversely, if the loss of the neural network increases, μ' =10μ. Adaptive adjustment of μ allows the Levenberg-Marquardt algorithm to achieve the fastest convergence speed when training smaller neural networks.

Step d, after the neural network finishes one training, namely after 1epoch, inputting data of a verification set and a test set into the neural network, and calculating a loss value of the network to be used as a measurement index of generalization capability of the neural network; when the loss value is reduced to 1×10 ^－6 Training is stopped below or when the maximum epoch is reached. Because the convergence rate of the L-M algorithm is extremely high, the training time is stopped when the training time reaches 200 epoch.

In a preferred embodiment, the element learner is a single hidden layer feed forward neural network, and at least 5 inputs of the element learner, i.e. at least 5 outputs of the DNN neural network, are inputs of the element learner; the output of the meta learner is the optimal control instruction a _c ^new 。

Preferably, in fig. 2, the optimal guidance command profile (expressed by lateral acceleration) required by the aircraft to complete the final speed optimal mid-section guidance is shown, under the same initial conditions, with different aerodynamic parameters; FIG. 2 illustrates an Aero _i Corresponding to table 2, the results of the ith aerodynamic coefficient are indicated; as can be seen from fig. 2, the optimal solution for one particular aerodynamic model differs from another model; however, it is also clearly observed from fig. 2 that within the same time interval, there is also a strong similarity between different solutions. This means that we can calculate new accelerations with a weighted combination of existing optimal accelerations, based on which the algorithm in the single hidden layer feedforward neural network is:

a _ci an ith input representing a single hidden layer feed-forward neural network;

C _j representing a weighting function;

b _j representing the bias function.

C _j And b _j Parameters to be determined are determined; thereby simplifying the problem of optimal control of middle section guidance of a new aircraft into searching an optimal weighting function C _i And an optimal bias function b _i The method comprises the steps of carrying out a first treatment on the surface of the If these two functions can be determined quickly with a small amount of data, the time consuming and data starvation problems of retraining a new network can be avoided.

The element learner which is essentially a single hidden layer feedforward neural network is arranged as shown in figure 3; the input to the meta-learner is the best guidance command a generated by the trained base learner _ci Outputting an optimal control command a required by a new missile _c ^new 。

In a preferred embodiment, in step 3, the training method of the meta learner is also similar to that of the base learner; first, a small amount of data s, a is prepared _c ^new ]The state vector S of the aircraft is firstly passed through 5 existing basis learners to obtain 5 outputs a _c1 To a _c5 . The five outputs are input to the element learner to obtain the output of network prediction, the predicted value is compared with the true value in the training set, the loss is calculated and the back propagation is performed until loss is small enough, wherein the specific training termination condition is that the loss is 1×10 ^－6 The following is given. Finally obtaining the trained meta learner, and further obtaining the network E.

Preferably, in step 3, the small amount refers to less than 500 sets of data, and the aircraft training data is aircraft trajectory data. I.e. each set of aircraft training data contains all the data in one trajectory of the aircraft.

Example 1

Selecting single hidden layer feedforward neural networks with different neuron numbers as element learners, wherein the number of the neurons is respectively 10 to 90, and the number of the neurons is 9 element learners; the training input data of the meta learner are output data of 5 trained DNN neural networks, and the training output data of the meta learner is an optimal control instruction;

C _j representing a weighting function;

b _j representing the bias function.

The training and verification were performed on the 9-element learner, and the training loss and verification loss were recorded, and the results are shown in fig. 4, and as can be seen from fig. 4, the performance loss is minimum at 40 neurons, so that the numerical simulation of the present application was performed using the 40-element learner.

Training time of 9 meta-learners as shown in table 4 below, the time required for retraining DNN was known to be 2 hours, and as can be seen from table 4, the meta-learners can complete learning of control instructions under new pneumatic parameters extremely quickly.

TABLE 4 training time of the Meta learner with different neuron numbers

The trained network E composed of the basic learner and the meta learner can be directly loaded into an onboard computer. The method has a small amount of operation in the computer, so that the time for calculating the optimal control instruction once is very short, only about 0.3-0.6ms is needed on the notebook computer, and the method can be faster on the customized onboard computer. After the aircraft is launched, the on-board computer calculates a set of state vectors S of the aircraft every 0.05 seconds and inputs the state vectors S into the network E, and then the network E can rapidly give out the state vectorsOptimal control instruction a required at this time _c ^new The method comprises the steps of carrying out a first treatment on the surface of the The aircraft need only follow the instructions of network E to achieve maximum terminal speeds under terminal angle constraints.

Example 2

Invoking a trained DNN neural network corresponding to five aircrafts, B ₁ 、B ₂ 、B ₃ 、B ₄ 、B ₅ And further forming a base learner through the 5 DNN neural networks, wherein the base learner is connected with the element learner to obtain a network E, the network E obtained by training 100 new tracks is E (5,100), and the network E obtained by training 500 new tracks is E (5,500).

Statistics E (5,100), E (5,500) and DNNs corresponding to five aircraft are shown with specific statistical position errors as shown in fig. 5, speed errors as shown in fig. 6, terminal angle errors as shown in fig. 7 and time errors as shown in fig. 8.

As can be seen from fig. 4 to 8, the error of the network E obtained using only less than 500 pieces of new data is very close to the error of the retraining DNN, and the effectiveness of the guidance method based on the integrated transfer learning in the present application is illustrated.

Example 3

And calling the trained DNN neural networks corresponding to the five aircrafts, and utilizing two, three, four or five of the DNN neural networks to form a base learner, wherein the base learner is connected with the element learner to obtain a network E, and the network E obtained by training 100 new tracks, in particular to E (2, 100), E (3, 100), E (4, 100) and E (5, 100).

The error performance is counted, the specific statistical position error is shown in fig. 9, the speed error is shown in fig. 10, the terminal angle error is shown in fig. 11, and the time error is shown in fig. 12.

As can be seen from fig. 9 to 12, when the number of DNN neural networks in the base learner is reduced to 2, the guidance performance is greatly reduced. The number of DNN neural networks in the base learner is continuously increased, and the guidance performance is limited, which means that the number of DNN neural networks is preferably more than 2, but not necessarily more than 5.

Example 4

And calling the trained DNN neural networks corresponding to the five aircrafts, and forming a base learner, wherein the base learner is connected with the element learner to obtain a network E, and different numbers of new trajectories are used for training to obtain the network E, specifically E (5, 6), E (5, 12), E (5, 25), E (5, 50), E (5, 100), E (5, 200) and E (5, 500).

The error performance is counted, the specific statistical position error is shown in fig. 13, the speed error is shown in fig. 14, the terminal angle error is shown in fig. 15, and the time error is shown in fig. 16.

13-16, when training data is reduced to 6 tracks, the performance of the ETLS method is rapidly deteriorated based on the guidance method of the integrated transfer learning, and the improvement caused by the increase of the data to 500 tracks is not great; the experimental result fully shows that the ETLS method can complete the study of the optimal control under the new pneumatic parameters only with a small data volume.

Example 5

Invoking network E (5, 12) in example 4, DNN neural network B in example 2 ₅ Conventional ballistic shaping guidance laws (TSG), see Zarchan, p., tactical and strategic missile guidance, vol.239, american Institute of Aeronautics and Astronautics, inc, 2012. The three guidance control schemes are utilized to carry out simulation on the same aircraft, the obtained aircraft track is shown in fig. 17, the speed change track with time is shown in fig. 18, the trajectory inclination angle change track with time is shown in fig. 19, and the control command change track with time is shown in fig. 20;

17-20, the performance of the middle-stage guidance method of the integrated transfer learning, namely the network E (5, 12), is very close to that of the optimal solution, and the effectiveness of the method is proved; although the traditional TSG method can meet the terminal angle constraint, the terminal speed cannot be optimized; furthermore, old neural network B ₅ Failure to work properly under the new aerodynamic parameters results in task failure.

The invention has been described above in connection with preferred embodiments, which are, however, exemplary only and for illustrative purposes. On this basis, the invention can be subjected to various substitutions and improvements, and all fall within the protection scope of the invention.

Claims

1. The middle section guidance method based on integrated transfer learning is characterized in that in the method, an optimal control instruction a is obtained in real time in a middle guidance section _c ^new The method comprises the steps of carrying out a first treatment on the surface of the By the optimal control instruction a _c ^new And controlling steering engine rudder operation of the aircraft to enable the aircraft to fly according to a preset track, and further completing a middle-stage guidance task with maximized tail speed.

2. The method for mid-stage guidance based on integrated transfer learning of claim 1, wherein,

the optimal control command a is obtained in real time by inputting the state vector S of the aircraft into a pre-trained network E in real time _c ^new 。

3. The method for mid-stage guidance based on integrated transfer learning of claim 1, wherein,

the training process of the network E comprises the following steps:

4. The method for mid-stage guidance based on integrated transfer learning of claim 3,

5. The method for mid-stage guidance based on integrated transfer learning of claim 3,

in the step 1, the DNN neural network is a deep feed-forward neural network, and the DNN neural network has 3 hidden layers, 20 neurons in each layer, and the neurons in each hidden layer are fully connected with the neurons in the upper layer.

6. The method for mid-stage guidance based on integrated transfer learning of claim 3,

in the step 1, the training process of the DNN neural network includes:

step a, normalizing and grouping training data;

step c, error back propagation and parameter updating;

7. The method for mid-stage guidance based on integrated transfer learning of claim 3,

the element learner is a single hidden layer feedforward neural network, and at least 5 elements of the element learner are input, namely at least 5 DNN neural network outputs are input of the element learner; the output of the meta learner is the optimal control instruction a _c ^new 。

8. The method for mid-stage guidance based on integrated transfer learning of claim 7,

C _j representing a weighting function;

b _j representing the bias function.

9. The method for mid-stage guidance based on integrated transfer learning of claim 3,

in step 3, the small amount refers to less than 500 sets of data.