CN109614631B

CN109614631B - Aircraft full-automatic pneumatic optimization method based on reinforcement learning and transfer learning

Info

Publication number: CN109614631B
Application number: CN201811217192.6A
Authority: CN
Inventors: 闫星辉; 朱纪洪; 匡敏驰; 王吴凡; 史恒
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-10-18
Filing date: 2018-10-18
Publication date: 2022-10-14
Anticipated expiration: 2038-10-18
Also published as: CN109614631A

Abstract

The invention discloses a full-automatic pneumatic optimization method of an aircraft based on reinforcement learning and transfer learning, which is used for solving the problem that the existing pneumatic optimization method is easy to fall into local optimization or has slow convergence speed, and meanwhile, the optimization method excludes manual intervention in the final high-precision optimization stage, so that the optimization efficiency is further improved. The technical scheme includes that firstly, a reinforcement learning environment based on semi-experience estimation and high-precision fluid simulation is respectively established, then a reinforcement learning neural network is established, a reward function is set, the global optimization capability of reinforcement learning is utilized, optimization experience is extracted from a semi-experience estimation method in the network training process and stored in network parameters, then another reinforcement learning neural network is established, the extracted optimization experience is migrated to the network through migration learning, then the network is applied to pneumatic optimization based on high-precision fluid simulation, and finally high-precision design parameters with excellent pneumatic performance are obtained through training the network. Compared with the background technology method, the method improves the convergence rate, has strong global optimization capability and has great engineering value for high-precision pneumatic optimization.

Description

Aircraft full-automatic pneumatic optimization method based on reinforcement learning and transfer learning

Technical Field

The invention belongs to the technical field of aircraft engineering, and particularly relates to a global aerodynamic optimization method of an aircraft based on reinforcement learning and transfer learning.

Background

The aerodynamic optimization refers to the design of the appearance and relative position of each main part of the aircraft, and the design with optimal aerodynamic performance needs to be obtained under the condition of meeting given constraint conditions. Mainstream pneumatic optimization methods can be divided into two major categories: gradient-based methods and traditional intelligent methods. The optimization efficiency of the unipolar value problem of the gradient-based method is high, but the pneumatic optimization mostly belongs to a complex multi-extreme-value problem, and the gradient-based method generally falls into local optimization easily and cannot meet the requirement of global pneumatic optimization. The traditional intelligent method mainly comprises various genetic algorithms, particle swarm algorithms and the like, the method has better global optimization capability and is suitable for complex multi-extremum optimization problems, but the convergence rate is slow, the method is difficult to be directly applied to time-consuming high-precision fluid simulation, the method is generally combined with agent models (such as response surfaces, kriging and the like), and the construction of the high-precision agent model containing a plurality of design variables is very difficult and time-consuming. In addition, the existing mainstream methods all need a certain degree of manual participation, and a link of interpenetration of human-computer interaction is needed in the optimization process, that is, the pneumatic optimization work is performed by taking the professional knowledge of designers as guidance, for example, the selection of an initial population in an intelligent algorithm, the setting of algorithm parameters and the like, so that full-automatic optimization cannot be realized, and the iteration efficiency of the optimization process is reduced.

In summary, the existing global pneumatic optimization methods all have the defects of being prone to fall into local optimization or low in convergence speed, manual participation is needed in the parameter configuration and calculation processes of the methods, efficient full-automatic optimization is difficult to achieve, and a pneumatic optimization method which is strong in global optimization capability, high in convergence speed and free of manual participation is urgently needed in the industry.

Disclosure of Invention

In order to solve the problem that the overall optimization capability and the convergence rate of the conventional aircraft pneumatic optimization algorithm cannot be obtained simultaneously, the invention provides an aircraft overall pneumatic optimization method based on reinforcement learning and transfer learning, and the method does not need manual intervention in the final optimization implementation stage, can realize full-automatic optimization, and further improves the optimization efficiency. Aiming at a given aircraft pneumatic optimization problem, firstly constructing a reinforcement learning neural network, and pre-extracting optimization experience and storing the optimization experience in the neural network through interactive training of the reinforcement learning neural network and a semi-experience engineering estimation method; secondly, another reinforcement learning neural network is constructed, and the network is initialized by using a transfer learning method, namely, the optimization design experience extracted by the first reinforcement learning neural network is transferred to the second reinforcement learning neural network by a method of sharing neural network parameters; and thirdly, combining a second reinforcement learning neural network with high-precision fluid simulation (CFD), and obtaining an excellent pneumatic layout design through interactive training of the second reinforcement learning neural network and the CFD. The invention fully utilizes the strong global optimization capability of the reinforcement learning, firstly extracts the optimization design experience from the semi-experience estimation method, and utilizes the experiences to ensure that the second reinforcement learning has higher starting point and a great deal of prior knowledge before the optimization is started, thereby greatly shortening the time of neural network training convergence, simultaneously still retaining the global optimization capability of the reinforcement learning, achieving the effect of giving consideration to the global optimization capability and the convergence speed, and because the calculation speed of the semi-experience estimation method is far faster than that of the high-precision fluid simulation, the time spent in the experience extraction link can be ignored, and the invention has great engineering value for the pneumatic optimization based on the high-precision fluid simulation.

The invention solves the problem of the global aerodynamic optimization of the aircraft and adopts the following technical scheme: a full-automatic pneumatic optimization method of an aircraft based on reinforcement learning and transfer learning is characterized by comprising the following steps:

establishing a parameterization method of the aerodynamic shape of the aircraft, selecting parameters after parameterization as design variables, selecting geometric parameters capable of determining the aerodynamic shape of the aircraft according to a given optimization problem by the parameterization method, taking the optimization of the airfoil of the aircraft as an example, wherein the parameters comprise a wingspan, a wingtip chord length, a wingroot chord length, a swept-back angle of the leading edge of the airfoil and the like;

secondly, establishing a reinforced learning environment based on a semi-empirical estimation method and high-precision fluid simulation respectively, wherein the basic method comprises the steps of inputting design parameters into the environment through a batch command processing method, and obtaining and outputting pneumatic performance indexes through a semi-empirical estimation method and high-precision fluid simulation calculation respectively, wherein the high-precision fluid simulation is based on a finite element method, and results are obtained through calculating a Navier-Stokes equation, so that the environment based on the high-precision fluid simulation needs to comprise a calculation grid based on a reference 3D model, and the self-adaptive adjustment of the calculation grid along with the change of the design parameters is realized by using a grid deformation technology as a support, the defects of repeated modeling and repeated grid division are overcome, and a foundation is laid for full-automatic optimization;

step three, establishing a reinforcement learning neural network for optimizing experience extraction, wherein the input of the whole network is a design variable value, the output of the whole network is a variable quantity of the design variable, the whole network consists of a value estimation network and a strategy network, the value network is used for evaluating expected income which can be obtained from a long-term view of the design variable quantity output by the strategy network, and the following strategy gradients are used for updating network parameters:

where J is the sum of expected returns, θ ^μ Being a parameter of the "performer" network, θ ^Q The parameters of the 'evaluator' network are E is the expected value calculation, a and s are the action and state in reinforcement learning, pi is the action selection function, and Q is the action evaluation function. The policy network is used for outputting corresponding variable quantity under given design variable, and updating network parameters by using the following loss function based on empirical playback:

wherein N is the number of samples selected from the experience pool, gamma is a discount factor of future rewards, and Q' is an evaluation function of the last moment;

setting a reward function according to a target function and a constraint condition of a given pneumatic optimization problem, training a network established in the third step in a semi-empirical estimation reinforcement learning environment on the basis of the reward function, acquiring an optimal design variable value by utilizing the strong global optimization capability of reinforcement learning, extracting optimization experience aiming at the given optimization problem from a semi-empirical estimation method in the training process, and storing the optimization experience in the network in the form of neural network parameters;

step five, establishing a reinforcement learning neural network for final optimization, wherein the network structure and configuration are consistent with those in the step three, and the extracted optimization experience is transferred to the network by a transfer learning method, namely, part of network layer parameters are selected from the neural network parameters trained in the step four and copied to the newly established reinforcement learning network;

and step six, training the network established in the step five in a high-precision fluid simulation reinforcement learning environment by using the same reward function as the step four until the training result is converged and a design variable value with optimal performance is obtained, and realizing the reutilization of the optimization experience extracted in the step four.

The invention has the characteristics and beneficial effects that:

1. a reinforced learning environment based on a semi-experience estimation method and high-precision fluid simulation is established, the former is used for extracting optimization experience, the latter is used for final optimization, and the latter integrates a computational grid and a grid deformation technology, so that manual repeated modeling and grid division are avoided;

2. based on a semi-empirical estimation reinforcement learning environment, extracting relevant optimization design experience from a semi-empirical estimation method by utilizing strong global optimization capability of reinforcement learning according to a given aircraft aerodynamic optimization problem;

3. the extracted experience is applied to the final optimization based on the high-precision fluid simulation by applying the transfer learning, the neural network replaces a human designer to provide suggestions and guidance in the optimization process, compared with the traditional optimization method, the convergence speed can be greatly improved while the global optimization capability is ensured, a human-computer interaction link is eliminated in the final optimization stage, the iteration efficiency of the optimization process is greatly improved, and the purpose of quickly obtaining the design parameters of the high-precision aircraft is achieved.

Drawings

FIG. 1 is a schematic diagram of a pneumatic layout optimization method based on reinforcement learning and transfer learning

FIG. 2 is a schematic diagram of reinforcement learning neural network input and output for pneumatic optimization

FIG. 3 is a schematic diagram of a high-accuracy fluid simulation reinforcement learning environment

Detailed Description

The present invention is further described below using examples, and the software, file formats and platforms described herein are used to provide further understanding of the present invention, but are not intended to limit the scope of the invention to the examples described herein.

Firstly, selecting a missile as an object to optimize the aerodynamic shape of a missile wing, improving the lift-drag ratio on the premise of keeping the aerodynamic center basically unchanged when optimizing a target, and optimizing by taking the distance from the front edge of a wing root to the top of a warhead, the span length, the chord length of a wing tip, the chord length of the wing root and the sweepback angle of the front edge and the back edge of a wing surface as design parameters.

And then establishing reinforced learning environments based on semi-empirical estimation software DATCOM and high-precision fluid simulation software Fluent respectively. For the DATCOM, writing a calculation file by using a Python language, wherein the file comprises flight working conditions, aircraft design parameters, invocation of a DATCOM calculation program and output of a calculation result; and for Fluent, writing a calculation file by using a Python language, wherein the file comprises flight working conditions, aircraft design parameters, command line operation codes of RBF Morph grid deformation software, calculation configuration of Fluent, calling of Fluent calculation program and output of calculation results.

Then, establishing two deep reinforcement learning neural networks of a performer-evaluator framework by using a Python language based on a Google Tensorflow platform, and designing a reward function by balancing lift-drag ratio and pressure center variation:

wherein C is _L And C _D The difference is a lift coefficient and a drag coefficient, the delta XCP is the variation of the aerodynamic center, and the calculation method is the ratio of the length from the warhead to the aerodynamic center to the length of the missile. Firstly, one network is combined with the DATCOM environment for training, after the training is converged, the network parameters of the hidden layer are extracted and copied into the other network to be used as initial values, and then the network after the migration learning initialization is combined with the Fluent environment for training until the training is converged to obtain the optimal design parameters.

The above-mentioned embodiments are further detailed to explain the objects, technical solutions and advantages of the present invention, but are not intended to limit the scope of the present invention, and all equivalent modifications, equivalent substitutions and obvious changes based on the present invention within the spirit and principle of the present invention are included in the scope of the present invention.

Claims

1. A full-automatic pneumatic optimization method of an aircraft based on reinforcement learning and transfer learning is characterized by comprising the following steps:

secondly, establishing a reinforced learning environment based on a semi-empirical estimation method and high-precision fluid simulation respectively, wherein the basic method comprises the steps of inputting design parameters into the environment through a batch command processing method, and obtaining and outputting pneumatic performance indexes through a semi-empirical estimation method and high-precision fluid simulation calculation respectively, wherein the high-precision fluid simulation is based on a finite element method, and results are obtained through calculating a Navier-Stokes equation, so that the environment based on the high-precision fluid simulation needs to comprise a calculation grid based on a reference 3D model and is supported by a grid deformation technology, the self-adaptive adjustment of the calculation grid along with the change of the design parameters is realized, the defects of repeated modeling and repeated grid division are avoided, and a foundation is laid for full-automatic optimization;

step three, establishing a reinforcement learning neural network for optimizing experience extraction, wherein the input of the whole network is a value of a design variable, the output is a variable quantity of the design variable, the whole network is composed of a value estimation network and a strategy network, the value network judges a strategy output by the strategy network according to long-term income, and updates network parameters by using the following strategy gradients:

where J is the sum of expected returns, θ ^μ Being a parameter of the "performer" network, θ ^Q As parameters of the evaluator network, E is the calculation of expected value, a, s are reinforcement learningPi is a motion selection function, and Q is a motion evaluation function. The policy network is used for outputting corresponding variable quantity under a given design variable, and updating network parameters by using the following loss function based on empirical playback:

setting a reward function according to an objective function and a constraint condition of a given pneumatic optimization problem, training a network established in the third step in a semi-empirical estimation reinforcement learning environment on the basis of the reward function, acquiring an optimal design variable value by utilizing strong global optimization capability of reinforcement learning, extracting optimization experience aiming at the given optimization problem from a semi-empirical estimation method in the training process, and storing the optimization experience in the network in the form of a neural network parameter;

step five, establishing a reinforcement learning neural network for final optimization, wherein the network structure and configuration are consistent with the network in the step three, and the extracted optimization experience is transferred to the network by a transfer learning method, namely, part of network layer parameters are selected from the neural network parameters trained in the step four and copied to the newly established reinforcement learning network;

and step six, training the network established in the step five in a high-precision fluid simulation reinforcement learning environment by using the reward function same as the step four, and realizing the reutilization of the optimization experience extracted in the step four until the training result is converged to obtain the design variable value with the optimal performance.