CN109614631B - Aircraft full-automatic pneumatic optimization method based on reinforcement learning and transfer learning - Google Patents

Aircraft full-automatic pneumatic optimization method based on reinforcement learning and transfer learning Download PDF

Info

Publication number
CN109614631B
CN109614631B CN201811217192.6A CN201811217192A CN109614631B CN 109614631 B CN109614631 B CN 109614631B CN 201811217192 A CN201811217192 A CN 201811217192A CN 109614631 B CN109614631 B CN 109614631B
Authority
CN
China
Prior art keywords
optimization
network
reinforcement learning
parameters
experience
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811217192.6A
Other languages
Chinese (zh)
Other versions
CN109614631A (en
Inventor
闫星辉
朱纪洪
匡敏驰
王吴凡
史恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201811217192.6A priority Critical patent/CN109614631B/en
Publication of CN109614631A publication Critical patent/CN109614631A/en
Application granted granted Critical
Publication of CN109614631B publication Critical patent/CN109614631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/23Design optimisation, verification or simulation using finite element methods [FEM] or finite difference methods [FDM]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a full-automatic pneumatic optimization method of an aircraft based on reinforcement learning and transfer learning, which is used for solving the problem that the existing pneumatic optimization method is easy to fall into local optimization or has slow convergence speed, and meanwhile, the optimization method excludes manual intervention in the final high-precision optimization stage, so that the optimization efficiency is further improved. The technical scheme includes that firstly, a reinforcement learning environment based on semi-experience estimation and high-precision fluid simulation is respectively established, then a reinforcement learning neural network is established, a reward function is set, the global optimization capability of reinforcement learning is utilized, optimization experience is extracted from a semi-experience estimation method in the network training process and stored in network parameters, then another reinforcement learning neural network is established, the extracted optimization experience is migrated to the network through migration learning, then the network is applied to pneumatic optimization based on high-precision fluid simulation, and finally high-precision design parameters with excellent pneumatic performance are obtained through training the network. Compared with the background technology method, the method improves the convergence rate, has strong global optimization capability and has great engineering value for high-precision pneumatic optimization.

Description

Aircraft full-automatic pneumatic optimization method based on reinforcement learning and transfer learning
Technical Field
The invention belongs to the technical field of aircraft engineering, and particularly relates to a global aerodynamic optimization method of an aircraft based on reinforcement learning and transfer learning.
Background
The aerodynamic optimization refers to the design of the appearance and relative position of each main part of the aircraft, and the design with optimal aerodynamic performance needs to be obtained under the condition of meeting given constraint conditions. Mainstream pneumatic optimization methods can be divided into two major categories: gradient-based methods and traditional intelligent methods. The optimization efficiency of the unipolar value problem of the gradient-based method is high, but the pneumatic optimization mostly belongs to a complex multi-extreme-value problem, and the gradient-based method generally falls into local optimization easily and cannot meet the requirement of global pneumatic optimization. The traditional intelligent method mainly comprises various genetic algorithms, particle swarm algorithms and the like, the method has better global optimization capability and is suitable for complex multi-extremum optimization problems, but the convergence rate is slow, the method is difficult to be directly applied to time-consuming high-precision fluid simulation, the method is generally combined with agent models (such as response surfaces, kriging and the like), and the construction of the high-precision agent model containing a plurality of design variables is very difficult and time-consuming. In addition, the existing mainstream methods all need a certain degree of manual participation, and a link of interpenetration of human-computer interaction is needed in the optimization process, that is, the pneumatic optimization work is performed by taking the professional knowledge of designers as guidance, for example, the selection of an initial population in an intelligent algorithm, the setting of algorithm parameters and the like, so that full-automatic optimization cannot be realized, and the iteration efficiency of the optimization process is reduced.
In summary, the existing global pneumatic optimization methods all have the defects of being prone to fall into local optimization or low in convergence speed, manual participation is needed in the parameter configuration and calculation processes of the methods, efficient full-automatic optimization is difficult to achieve, and a pneumatic optimization method which is strong in global optimization capability, high in convergence speed and free of manual participation is urgently needed in the industry.
Disclosure of Invention
In order to solve the problem that the overall optimization capability and the convergence rate of the conventional aircraft pneumatic optimization algorithm cannot be obtained simultaneously, the invention provides an aircraft overall pneumatic optimization method based on reinforcement learning and transfer learning, and the method does not need manual intervention in the final optimization implementation stage, can realize full-automatic optimization, and further improves the optimization efficiency. Aiming at a given aircraft pneumatic optimization problem, firstly constructing a reinforcement learning neural network, and pre-extracting optimization experience and storing the optimization experience in the neural network through interactive training of the reinforcement learning neural network and a semi-experience engineering estimation method; secondly, another reinforcement learning neural network is constructed, and the network is initialized by using a transfer learning method, namely, the optimization design experience extracted by the first reinforcement learning neural network is transferred to the second reinforcement learning neural network by a method of sharing neural network parameters; and thirdly, combining a second reinforcement learning neural network with high-precision fluid simulation (CFD), and obtaining an excellent pneumatic layout design through interactive training of the second reinforcement learning neural network and the CFD. The invention fully utilizes the strong global optimization capability of the reinforcement learning, firstly extracts the optimization design experience from the semi-experience estimation method, and utilizes the experiences to ensure that the second reinforcement learning has higher starting point and a great deal of prior knowledge before the optimization is started, thereby greatly shortening the time of neural network training convergence, simultaneously still retaining the global optimization capability of the reinforcement learning, achieving the effect of giving consideration to the global optimization capability and the convergence speed, and because the calculation speed of the semi-experience estimation method is far faster than that of the high-precision fluid simulation, the time spent in the experience extraction link can be ignored, and the invention has great engineering value for the pneumatic optimization based on the high-precision fluid simulation.
The invention solves the problem of the global aerodynamic optimization of the aircraft and adopts the following technical scheme: a full-automatic pneumatic optimization method of an aircraft based on reinforcement learning and transfer learning is characterized by comprising the following steps:
establishing a parameterization method of the aerodynamic shape of the aircraft, selecting parameters after parameterization as design variables, selecting geometric parameters capable of determining the aerodynamic shape of the aircraft according to a given optimization problem by the parameterization method, taking the optimization of the airfoil of the aircraft as an example, wherein the parameters comprise a wingspan, a wingtip chord length, a wingroot chord length, a swept-back angle of the leading edge of the airfoil and the like;
secondly, establishing a reinforced learning environment based on a semi-empirical estimation method and high-precision fluid simulation respectively, wherein the basic method comprises the steps of inputting design parameters into the environment through a batch command processing method, and obtaining and outputting pneumatic performance indexes through a semi-empirical estimation method and high-precision fluid simulation calculation respectively, wherein the high-precision fluid simulation is based on a finite element method, and results are obtained through calculating a Navier-Stokes equation, so that the environment based on the high-precision fluid simulation needs to comprise a calculation grid based on a reference 3D model, and the self-adaptive adjustment of the calculation grid along with the change of the design parameters is realized by using a grid deformation technology as a support, the defects of repeated modeling and repeated grid division are overcome, and a foundation is laid for full-automatic optimization;
step three, establishing a reinforcement learning neural network for optimizing experience extraction, wherein the input of the whole network is a design variable value, the output of the whole network is a variable quantity of the design variable, the whole network consists of a value estimation network and a strategy network, the value network is used for evaluating expected income which can be obtained from a long-term view of the design variable quantity output by the strategy network, and the following strategy gradients are used for updating network parameters:
Figure BDA0001833845070000021
where J is the sum of expected returns, θ μ Being a parameter of the "performer" network, θ Q The parameters of the 'evaluator' network are E is the expected value calculation, a and s are the action and state in reinforcement learning, pi is the action selection function, and Q is the action evaluation function. The policy network is used for outputting corresponding variable quantity under given design variable, and updating network parameters by using the following loss function based on empirical playback:
Figure BDA0001833845070000031
wherein N is the number of samples selected from the experience pool, gamma is a discount factor of future rewards, and Q' is an evaluation function of the last moment;
setting a reward function according to a target function and a constraint condition of a given pneumatic optimization problem, training a network established in the third step in a semi-empirical estimation reinforcement learning environment on the basis of the reward function, acquiring an optimal design variable value by utilizing the strong global optimization capability of reinforcement learning, extracting optimization experience aiming at the given optimization problem from a semi-empirical estimation method in the training process, and storing the optimization experience in the network in the form of neural network parameters;
step five, establishing a reinforcement learning neural network for final optimization, wherein the network structure and configuration are consistent with those in the step three, and the extracted optimization experience is transferred to the network by a transfer learning method, namely, part of network layer parameters are selected from the neural network parameters trained in the step four and copied to the newly established reinforcement learning network;
and step six, training the network established in the step five in a high-precision fluid simulation reinforcement learning environment by using the same reward function as the step four until the training result is converged and a design variable value with optimal performance is obtained, and realizing the reutilization of the optimization experience extracted in the step four.
The invention has the characteristics and beneficial effects that:
1. a reinforced learning environment based on a semi-experience estimation method and high-precision fluid simulation is established, the former is used for extracting optimization experience, the latter is used for final optimization, and the latter integrates a computational grid and a grid deformation technology, so that manual repeated modeling and grid division are avoided;
2. based on a semi-empirical estimation reinforcement learning environment, extracting relevant optimization design experience from a semi-empirical estimation method by utilizing strong global optimization capability of reinforcement learning according to a given aircraft aerodynamic optimization problem;
3. the extracted experience is applied to the final optimization based on the high-precision fluid simulation by applying the transfer learning, the neural network replaces a human designer to provide suggestions and guidance in the optimization process, compared with the traditional optimization method, the convergence speed can be greatly improved while the global optimization capability is ensured, a human-computer interaction link is eliminated in the final optimization stage, the iteration efficiency of the optimization process is greatly improved, and the purpose of quickly obtaining the design parameters of the high-precision aircraft is achieved.
Drawings
FIG. 1 is a schematic diagram of a pneumatic layout optimization method based on reinforcement learning and transfer learning
FIG. 2 is a schematic diagram of reinforcement learning neural network input and output for pneumatic optimization
FIG. 3 is a schematic diagram of a high-accuracy fluid simulation reinforcement learning environment
Detailed Description
The present invention is further described below using examples, and the software, file formats and platforms described herein are used to provide further understanding of the present invention, but are not intended to limit the scope of the invention to the examples described herein.
Firstly, selecting a missile as an object to optimize the aerodynamic shape of a missile wing, improving the lift-drag ratio on the premise of keeping the aerodynamic center basically unchanged when optimizing a target, and optimizing by taking the distance from the front edge of a wing root to the top of a warhead, the span length, the chord length of a wing tip, the chord length of the wing root and the sweepback angle of the front edge and the back edge of a wing surface as design parameters.
And then establishing reinforced learning environments based on semi-empirical estimation software DATCOM and high-precision fluid simulation software Fluent respectively. For the DATCOM, writing a calculation file by using a Python language, wherein the file comprises flight working conditions, aircraft design parameters, invocation of a DATCOM calculation program and output of a calculation result; and for Fluent, writing a calculation file by using a Python language, wherein the file comprises flight working conditions, aircraft design parameters, command line operation codes of RBF Morph grid deformation software, calculation configuration of Fluent, calling of Fluent calculation program and output of calculation results.
Then, establishing two deep reinforcement learning neural networks of a performer-evaluator framework by using a Python language based on a Google Tensorflow platform, and designing a reward function by balancing lift-drag ratio and pressure center variation:
Figure BDA0001833845070000041
wherein C is L And C D The difference is a lift coefficient and a drag coefficient, the delta XCP is the variation of the aerodynamic center, and the calculation method is the ratio of the length from the warhead to the aerodynamic center to the length of the missile. Firstly, one network is combined with the DATCOM environment for training, after the training is converged, the network parameters of the hidden layer are extracted and copied into the other network to be used as initial values, and then the network after the migration learning initialization is combined with the Fluent environment for training until the training is converged to obtain the optimal design parameters.
The above-mentioned embodiments are further detailed to explain the objects, technical solutions and advantages of the present invention, but are not intended to limit the scope of the present invention, and all equivalent modifications, equivalent substitutions and obvious changes based on the present invention within the spirit and principle of the present invention are included in the scope of the present invention.

Claims (1)

1. A full-automatic pneumatic optimization method of an aircraft based on reinforcement learning and transfer learning is characterized by comprising the following steps:
establishing a parameterization method of the aerodynamic shape of the aircraft, selecting parameters after parameterization as design variables, selecting geometric parameters capable of determining the aerodynamic shape of the aircraft according to a given optimization problem by the parameterization method, taking the optimization of the airfoil of the aircraft as an example, wherein the parameters comprise a wingspan, a wingtip chord length, a wingroot chord length, a swept-back angle of the leading edge of the airfoil and the like;
secondly, establishing a reinforced learning environment based on a semi-empirical estimation method and high-precision fluid simulation respectively, wherein the basic method comprises the steps of inputting design parameters into the environment through a batch command processing method, and obtaining and outputting pneumatic performance indexes through a semi-empirical estimation method and high-precision fluid simulation calculation respectively, wherein the high-precision fluid simulation is based on a finite element method, and results are obtained through calculating a Navier-Stokes equation, so that the environment based on the high-precision fluid simulation needs to comprise a calculation grid based on a reference 3D model and is supported by a grid deformation technology, the self-adaptive adjustment of the calculation grid along with the change of the design parameters is realized, the defects of repeated modeling and repeated grid division are avoided, and a foundation is laid for full-automatic optimization;
step three, establishing a reinforcement learning neural network for optimizing experience extraction, wherein the input of the whole network is a value of a design variable, the output is a variable quantity of the design variable, the whole network is composed of a value estimation network and a strategy network, the value network judges a strategy output by the strategy network according to long-term income, and updates network parameters by using the following strategy gradients:
Figure FDA0001833845060000011
where J is the sum of expected returns, θ μ Being a parameter of the "performer" network, θ Q As parameters of the evaluator network, E is the calculation of expected value, a, s are reinforcement learningPi is a motion selection function, and Q is a motion evaluation function. The policy network is used for outputting corresponding variable quantity under a given design variable, and updating network parameters by using the following loss function based on empirical playback:
Figure FDA0001833845060000012
wherein N is the number of samples selected from the experience pool, gamma is a discount factor of future rewards, and Q' is an evaluation function of the last moment;
setting a reward function according to an objective function and a constraint condition of a given pneumatic optimization problem, training a network established in the third step in a semi-empirical estimation reinforcement learning environment on the basis of the reward function, acquiring an optimal design variable value by utilizing strong global optimization capability of reinforcement learning, extracting optimization experience aiming at the given optimization problem from a semi-empirical estimation method in the training process, and storing the optimization experience in the network in the form of a neural network parameter;
step five, establishing a reinforcement learning neural network for final optimization, wherein the network structure and configuration are consistent with the network in the step three, and the extracted optimization experience is transferred to the network by a transfer learning method, namely, part of network layer parameters are selected from the neural network parameters trained in the step four and copied to the newly established reinforcement learning network;
and step six, training the network established in the step five in a high-precision fluid simulation reinforcement learning environment by using the reward function same as the step four, and realizing the reutilization of the optimization experience extracted in the step four until the training result is converged to obtain the design variable value with the optimal performance.
CN201811217192.6A 2018-10-18 2018-10-18 Aircraft full-automatic pneumatic optimization method based on reinforcement learning and transfer learning Active CN109614631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811217192.6A CN109614631B (en) 2018-10-18 2018-10-18 Aircraft full-automatic pneumatic optimization method based on reinforcement learning and transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811217192.6A CN109614631B (en) 2018-10-18 2018-10-18 Aircraft full-automatic pneumatic optimization method based on reinforcement learning and transfer learning

Publications (2)

Publication Number Publication Date
CN109614631A CN109614631A (en) 2019-04-12
CN109614631B true CN109614631B (en) 2022-10-14

Family

ID=66002900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811217192.6A Active CN109614631B (en) 2018-10-18 2018-10-18 Aircraft full-automatic pneumatic optimization method based on reinforcement learning and transfer learning

Country Status (1)

Country Link
CN (1) CN109614631B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017267B (en) * 2019-05-28 2023-07-25 上海科技大学 Quick fluid synthesis method, device, system and medium based on machine learning
CN114051748B (en) * 2019-07-08 2024-04-19 谷歌有限责任公司 Optimizing cellular networks using machine learning
CN110554707B (en) * 2019-10-17 2022-09-30 陕西师范大学 Q learning automatic parameter adjusting method for aircraft attitude control loop
CN110806759B (en) * 2019-11-12 2020-09-08 清华大学 Aircraft route tracking method based on deep reinforcement learning
CN111553118B (en) * 2020-04-26 2023-10-27 西安交通大学 Multi-dimensional continuous optimization variable global optimization method based on reinforcement learning
CN111597698B (en) * 2020-05-08 2022-04-26 浙江大学 Method for realizing pneumatic optimization design based on deep learning multi-precision optimization algorithm
CN113569477B (en) * 2021-07-26 2023-06-20 北京航空航天大学 High lift device optimization method based on deep reinforcement learning and transfer learning
CN113867178B (en) * 2021-10-26 2022-05-31 哈尔滨工业大学 Virtual and real migration training system for multi-robot confrontation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102072922B (en) * 2009-11-25 2013-04-03 东北林业大学 Particle swarm optimization neural network model-based method for detecting moisture content of wood
CN108052004B (en) * 2017-12-06 2020-11-10 湖北工业大学 Industrial mechanical arm automatic control method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN109614631A (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN109614631B (en) Aircraft full-automatic pneumatic optimization method based on reinforcement learning and transfer learning
CN107491616B (en) Structure finite element parametric modeling method suitable for grid configuration control surface
CN108647370B (en) Unmanned helicopter aerodynamic shape optimization design method based on double-ring iteration
CN112016167B (en) Aircraft aerodynamic shape design method and system based on simulation and optimization coupling
CN111553118B (en) Multi-dimensional continuous optimization variable global optimization method based on reinforcement learning
CN109522602A (en) A kind of Modelica Model Parameter Optimization method based on agent model
CN108459505B (en) Unconventional layout aircraft rapid modeling method suitable for control iterative design
CN114611437B (en) Method and device for establishing aircraft pneumatic model database based on CFD technology
WO2023168772A1 (en) Time parallel perturbation domain updating method for aircraft dynamic characteristic simulation
CN115437795B (en) Video memory recalculation optimization method and system for heterogeneous GPU cluster load perception
CN111191785A (en) Structure searching method based on expanded search space
CN109902359A (en) The housing construction optimum design method of unmanned plane race, Flying-wing
Maheri Multiobjective optimisation and integrated design of wind turbine blades using WTBM-ANSYS for high fidelity structural analysis
CN115688276A (en) Aircraft appearance automatic optimization method, system, equipment and medium based on discrete companion method
CN112414668A (en) Wind tunnel test data static bomb correction method, device, equipment and medium
CN111079326B (en) Two-dimensional anisotropic grid cell measurement tensor field smoothing method
CN106407932B (en) Handwritten Digit Recognition method based on fractional calculus Yu generalized inverse neural network
CN114676522A (en) Pneumatic shape optimization design method, system and equipment integrating GAN and transfer learning
CN106650156A (en) Multi-disciplinary design optimization method of near space airship on the basis of concurrent subspace optimizer
CN116628854A (en) Wing section aerodynamic characteristic prediction method, system, electronic equipment and storage medium
Liu et al. Airfoils optimization based on deep reinforcement learning to improve the aerodynamic performance of rotors
CN114818128B (en) Modeling method and optimizing method for ship body local curved surface optimizing neural network
Barrett et al. Airfoil shape design and optimization using multifidelity analysis and embedded inverse design
Barrett et al. Airfoil design and optimization using multi-fidelity analysis and embedded inverse design
Lazzara et al. Multifidelity geometry and analysis in aircraft conceptual design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant