CN116956987B - On-line track optimization method for sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization - Google Patents

On-line track optimization method for sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization Download PDF

Info

Publication number
CN116956987B
CN116956987B CN202310946041.9A CN202310946041A CN116956987B CN 116956987 B CN116956987 B CN 116956987B CN 202310946041 A CN202310946041 A CN 202310946041A CN 116956987 B CN116956987 B CN 116956987B
Authority
CN
China
Prior art keywords
particle
optimization
sub
optimizing
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310946041.9A
Other languages
Chinese (zh)
Other versions
CN116956987A (en
Inventor
周宏宇
刘芳
方艺忠
刘佳琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202310946041.9A priority Critical patent/CN116956987B/en
Publication of CN116956987A publication Critical patent/CN116956987A/en
Application granted granted Critical
Publication of CN116956987B publication Critical patent/CN116956987B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An online trajectory optimization method of a sub-orbit hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization belongs to the field of trajectory optimization and optimization control of aircrafts. The method solves the problems of low efficiency and low precision of the existing online track optimization method. The invention explores the optimizing mechanism of the particle swarm optimization method, utilizes the reinforcement learning agent to actively control the movement trend of particles, so that the agent has the capability of autonomously determining the optimizing direction according to the optimizing process of the nonlinear optimization problem, greatly improves the optimizing performance of the particle swarm optimization method, improves the precision of the online track optimization method, and introduces the reinforcement learning method to obviously improve the efficiency of the online track optimization method. The method can be applied to online track optimization of the sub-track hypersonic carrier.

Description

On-line track optimization method for sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization
Technical Field
The invention belongs to the field of optimization and optimization control of aircraft trajectory, and particularly relates to an online trajectory optimization method of a sub-orbit hypersonic carrier.
Background
The reusable sub-orbital hypersonic carrier is one of the important development directions of aerospace science and technology, and is an important attack direction for human future development of global rapid arrival technology. The sub-track hypersonic carrier can horizontally take off and land from a runway like an airplane, can be switched between adjacent space and outer space at will, and has higher economical efficiency, flexibility and adaptability compared with the existing transport system.
Under the conditions of state disturbance, environment deviation, task change and the like, the optimal track of the sub-orbit hypersonic carrier needs to be planned on line. The existing planning problem mainly considers the traditional trajectory optimization constraint with analytical expression, and the sub-orbit hypersonic speed carrier has the large-cross-domain and fast-time-varying motion characteristic. Complex nonlinear coupling factors exist between multimode combined power, large lift-drag ratio aerodynamic shape and uncertain adjacent space environment of the sub-orbit hypersonic carrier, including power-aerodynamic-structure-load-thermal environment and the like, and corresponding multi-field coupling models and constraint models cannot be established in a resolving mode, namely a strong coupling nonlinear track optimization model cannot be established, so that new requirements are put forward on a track optimization method. And for a complex strong coupling nonlinear track optimization model, the existing online track optimization method has the problems of low efficiency, low precision and the like, and meanwhile, the existing hybrid optimization method mainly adopts a strategy of combining initial value search with rapid convergence, for example, initial values are generated by using particle swarm optimization (Particleswarm optimization, PSO), genetic algorithm and the like, and are further solved by using a pseudo-spectrum method, a convex optimization method and the like, but the optimization strategy does not fundamentally solve the existing defects of each optimization method, and only can primarily obtain the effects of compensating for the advantages.
Disclosure of Invention
The invention aims to solve the problems of low efficiency and low precision of the existing online track optimization method, and provides an online track optimization method for a sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the method for optimizing the online track of the sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization specifically comprises the following steps:
step 1, setting a learning factor c 1 And c 2 Setting a random number eta 1 、η 2 And eta 3 Is a value of (2);
setting the upper limit of training rounds of the reinforcement learning intelligent agent as M, and setting the upper limit of the particle swarm evolution algebra in a single round as M;
step 2, training the reinforcement learning agent;
step 21, initializing the position and speed of each particle in the particle swarm, and updating the position of the particle according to the initialized position and speed of the particle;
step 22, setting the round number to be 1;
step 23, let evolution algebra p=1;
step 24, calculating the state of the intelligent agent according to the position of each particle in the particle swarm;
step 25, randomly exploring actions of the intelligent agent;
step 26, utilizing the inertial weight l in the action of the agent p Updating the position of each particle in the particle swarm;
step 27, calculating an agent reward according to the particle positions in the updated particle swarm; the evolution algebra p is increased by 1, and whether the evolution algebra is smaller than or equal to m is judged;
if yes, return to execute step 24;
otherwise, executing step 28;
step 28, the number of rounds is increased by 1, and whether the number of rounds is smaller than or equal to M is judged;
if yes, returning to the execution step 23;
if not, ending training;
and 3, carrying out online optimization on the track of the sub-track hypersonic carrier by using the trained reinforcement learning agent.
Further, the learning factor c 1 =1.5, learning factor c 2 =0.5, random number η 1 ∈[0.5,1]Random number eta 2 ∈[0.5,1]Random number eta 3 ∈[0,1],M=2000,m=30。
Further, the specific process of step 24 is as follows:
by M pso Particle pair D pso Searching by optimizing parameters, wherein the position vector of the ith particle in the search space is Representing the ith particle pair 1 st, 2 nd, …, D in the search space pso Search results of the optimization parameters, the velocity vector of the ith particle in the search space isUpper corner markT stands for transpose, < >>Representing the ith particle pair 1 st, 2 nd, …, D in the search space pso The speed of forward movement when searching for the optimum parameters i=1, 2, …, M pso ,j=1,2,…,D pso
The historical best position vector for all particle findings in the whole particle population isWherein p is g,1 Representing the search result of the historical best position vector for the 1 st optimization parameter, p g,2 Representing the search result for the 2 nd optimization parameter in the historical best position vector,/for>Represents the D-th of the historical best position vectors pso Search results of the optimization parameters, the historical best position vector of the i-th particle discovery is +.>Wherein p is i,1 Search results for the 1 st optimization parameter in the historical best position vector representing the i-th particle discovery, p i,2 Search results for the 2 nd optimization parameter in the historical best position vector representing the i-th particle discovery, are->The historical best position vector representing the i-th particle discovery is for the D-th pso Search results of the individual optimization parameters;
all particles evolved in the manner of formula (1):
wherein t is p Is the current evolution algebra;represents the t p At evolution, the speed of forward movement of the ith particle in search space searching for the jth optimization parameter,/->Represents the t p At generation-1 evolution, the speed of forward movement of the ith particle in the search space when searching for the jth optimization parameter, l p Is inertial weight, p i,j Is the historical best position of the jth optimization parameter found by the ith particle, p g,j Is the historical best position of the j-th optimization parameter found by all particles, < >>Represents the t p When evolving, search results of the ith particle on the jth optimized parameter position in the search space,/->Represents the t p -upon evolution of generation 1, search results of the ith particle for the jth optimized parameter position in the search space;
introducing a random number eta 3 Formula (1) is rewritten as formula (2):
wherein,represents the t p When +1 generation evolves, the ith particle searches the search result of the jth optimizing parameter position in the search space;
obtaining a second order control system according to equation (2):
wherein omega p Is the frequency of the second-order control system, xi p Damping for a second order control system;
convergence time t of the particles system The method comprises the following steps:
wherein ln represents natural logarithm, ε p Is a set threshold value;
define agent state s as:
wherein G is best P derived for current evolutionary algebra g Corresponding performance index, G' best P evolved for the previous generation g Corresponding performance index, G i P, the current algebra of evolution i The corresponding performance metrics, avg () represents the calculated average, δ swarm Is population diversity.
Further, the inertial weight l p The value range of (1) is (0, 1)]。
Further, the threshold ε p The value of (2) is 0.02.
Further, the method for calculating the population diversity comprises the following steps:
wherein,represents the t p At generation 1 evolution, all particles search for the average position of the j-th optimization parameter in the search space.
Further, the convergence state of the particles is:
wherein E represents a mathematical expectation, p i,j =p,p g,j =g。
Further, the step 25 specifically includes:
defining an action a of the agent:
a=[l p ,D w ] T (13)
wherein, the upper corner mark T represents transposition, D w =±1;
When D is w When=1, the agent makes particle dispersion decisions and makes the particle dispersion decisions based on ζ p ≤ε p Giving an inertial weight l p Is a value of (2);
when D is w When the energy value is = -1, the intelligent agent makes particle convergence decision and makes particle convergence decision according to ζ p >ε p Giving an inertial weight l p Is used as a reference to the value of (a),
further, the specific process of step 27 is as follows:
wherein r represents the rewarding value of the agent, G best Representing p after updating the particle positions in the particle swarm g Corresponding performance indexes.
Further, the specific process of the step 3 is as follows:
step 31, initializing the position and speed of the particles, and updating the position of the particles according to the initialized position and speed of the particles;
step 32, updating control variables of the sub-orbital hypersonic carrier according to the updated particle position, wherein the control variables comprise attack angle and roll angle;
step 33, calculating the state of the intelligent agent according to the updated particle position;
step 34, inputting the state of the intelligent agent calculated in the step 33 to the trained intelligent agent, and outputting actions by the intelligent agent;
step 35, according to the inertia weight l in the output action of the agent p Updating the position of the particles;
step 36, cycling the steps 32 to 35 until the sub-orbital hypersonic carrier reaches the target point predetermined by the task.
The beneficial effects of the invention are as follows:
the invention explores the optimizing mechanism of the particle swarm optimization method, utilizes the reinforcement learning agent to actively control the movement trend of particles, so that the agent has the capability of autonomously determining the optimizing direction according to the optimizing process of the nonlinear optimization problem, greatly improves the optimizing performance of the particle swarm optimization method, improves the precision of the online track optimization method, and introduces the reinforcement learning method to obviously improve the efficiency of the online track optimization method.
Drawings
FIG. 1 is an analysis chart of an optimizing process of the conventional particle swarm optimization method;
FIG. 2 is a schematic diagram of over-damped, under-damped, and divergent states under different inertial weights;
FIG. 3 is a graph showing the variation of damping, frequency and convergence time with inertial weight for different learning factors;
FIG. 4 is a schematic diagram of a training process of reinforcement learning agents for group optimization states.
Detailed Description
The method for optimizing the online track of the sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization specifically comprises the following steps:
step 1, setting a learning factor c 1 And c 2 Setting a random number eta 1 、η 2 And eta 3 Is a value of (2);
setting the upper limit of training rounds of the reinforcement learning intelligent agent as M, and setting the upper limit of the particle swarm evolution algebra in a single round as M;
setting a reinforcement learning method to adopt a near-end strategy optimization (Proximal Policy Optimization, PPO) method;
step 2, training the reinforcement learning agent as shown in fig. 4;
step 21, initializing the position and the speed of each particle in the particle swarm, and updating the position of the particle according to the initialized position and speed of the particle (namely according to a formula (1));
step 22, setting the round number to be 1;
step 23, let evolution algebra p=1;
step 24, calculating the state of the intelligent agent according to the position of each particle in the particle swarm;
step 25, randomly exploring actions of the intelligent agent;
step 26, utilizing the inertial weight l in the action of the agent p Updating the position of each particle in the population of particles (i.e. weighting the inertia i p Substitution (2));
step 27, calculating an agent reward according to the particle positions in the updated particle swarm; the evolution algebra p is increased by 1, and whether the evolution algebra is smaller than or equal to m is judged;
if yes, return to execute step 24;
otherwise, executing step 28;
step 28, the number of rounds is increased by 1, and whether the number of rounds is smaller than or equal to M is judged;
if yes, returning to the execution step 23;
if not, ending training;
and for the current round, after calculating the agent rewards of each evolution in the current round, summing the agent rewards of each evolution in the current round, and adjusting the action selection of the next round according to the sum of the agent rewards of the current round.
And 3, carrying out online optimization on the track of the sub-track hypersonic carrier by using the trained reinforcement learning agent.
As shown in FIG. 1, which is an analysis chart of an optimizing process of the existing particle swarm optimization method, the optimizing mechanism analysis method of the existing particle swarm optimization method provided by the invention can fundamentally give out key factors influencing the optimizing process and the motion trend of the particle swarm, give out the correct direction of the improvement of the method, and solve the problems of poor improving effect and insufficient combination of various methods. The invention utilizes the exploration mechanism of the reinforcement learning method to the environment, so that the reinforcement learning intelligent agent has the capability of autonomously determining the optimizing direction according to the optimizing process of the nonlinear optimizing problem, improves the solving capability and efficiency of the optimizing problem, and helps to realize the online planning and safe flight of the reusable sub-orbit carrier. The invention combines the particle swarm optimization method and the reinforcement learning method, and avoids the problems of high data requirement, poor generalization, poor scene applicability and the like of independent reinforcement learning (Reinforcement learning, RL). The optimization method can solve other complex nonlinear optimization problems.
The second embodiment is as follows: the present embodiment is different from the specific embodiment in that the learning factor c 1 =1.5, learning factor c 2 =0.5, random number η 1 ∈[0.5,1]Random number eta 2 ∈[0.5,1]Random number eta 3 ∈[0,1],M=2000,m=30。
Other steps and parameters are the same as in the first embodiment.
And a third specific embodiment: this embodiment is different from the first or second embodiment in that the specific process of step 24 is:
by M pso Particle pair D pso Searching by optimizing parameters, wherein the position vector of the ith particle in the search space isRepresenting the ith particle pair 1 st, 2 nd, …, D in the search space pso Search results of optimization parameters D pso The optimization parameters are all design variables included in the function of calculated angle of attack and roll angle, D pso Substituting the optimized parameters into the function to obtain attack angle and tilting angle, wherein the speed vector of the ith particle in the search space is +.>The upper corner mark T represents the transpose, ">Representing the ith particle pair 1 st, 2 nd, …, D in the search space pso The speed of forward movement when searching for the optimum parameters i=1, 2, …, M pso ,j=1,2,…,D pso
The historical best position vector for all particle findings in the whole particle population isWherein p is g,1 Representing the search result of the historical best position vector for the 1 st optimization parameter, p g,2 Representing the search result for the 2 nd optimization parameter in the historical best position vector,/for>Represents the D-th of the historical best position vectors pso Search results of the optimization parameters, the historical best position vector of the i-th particle discovery is +.>Wherein p is i,1 Search results for the 1 st optimization parameter in the historical best position vector representing the i-th particle discovery, p i,2 Search results for the 2 nd optimization parameter in the historical best position vector representing the i-th particle discovery, are->The historical best position vector representing the i-th particle discovery is for the D-th pso Search results of the individual optimization parameters;
all particles evolved in the manner of formula (1):
wherein t is p For the current evolution algebra, the initial value is equal to 1;represents the t p At evolution, the speed of forward movement of the ith particle in search space searching for the jth optimization parameter,/->Represents the t p At generation-1 evolution, the speed of forward movement of the ith particle in the search space when searching for the jth optimization parameter, l p Is inertial weight, p i,j Is the historical best position of the jth optimization parameter found by the ith particle, p g,j Is the historical best position of the j-th optimization parameter found by all particles, < >>Represents the t p When evolving, search results of the ith particle on the jth optimized parameter position in the search space,/->Represents the t p -upon evolution of generation 1, search results of the ith particle for the jth optimized parameter position in the search space;
introducing a random number eta 3 Formula (1) is rewritten as formula (2):
wherein,represents the t p When +1 generation evolves, the ith particle searches the search result of the jth optimizing parameter position in the search space;
a typical second order control system is obtained according to equation (2):
wherein omega p Is the frequency of the second-order control system, xi p Damping for a second order control system;
definition r 4 =c 1 η 1 +c 2 η 2 -1, then r when the particle converges 4 <l p When the particles diverge r 4 >l p
Consider the following conventional particle swarm optimization approach: η (eta) 1,2 ∈[0,1]And c 1 =c 2 =2. Definition r 4 =c 1 η 1 +c 2 η 2 -1, then r 4 The range of the value of (C) is [ -1,3]At this time, there are two cases where decision action a of the agent is invalidated:
①r 4 <0, no matter l p Taking any value will cause the system to diverge (l) p >0>r 4 );
②r 4 >1, no matter l p Taking any value will make the system converge (l) p <1<r 4 ). In both cases, any decision action a made by the agent does not affect r 4 And l p Is a size relationship of (a).
Thus, at 0<l p <1 should ensure 0<r 4 <1. Through analysis, the invention sets eta 1,2 ∈[0.5,1]And c 1 =c 2 =1, η compared to the conventional particle swarm optimization method 1,2 To 0.75, and c) 1 ,c 2 Is reduced to 1.0.
Further, consider the system over-damping condition ζ p >1. From formula (3):
the finishing formula (4) can be obtained:
constructing a unitary quadratic function with an upward openingThen y (l) p ) The root discriminant Δy of=0 is:
Δy=(2r 4 +4) 2 -4(r 4 2 -4c 2 η 3 )=16+16r 4 +16c 2 η 3 (6)
it can be seen that Deltay>0, function y (l p ) There are two zero points, which are denoted as l p0
Taking into account l p0 <1, eliminating larger zero point to obtainAt the same time consider 0<l p0 ObtainingSo the system has over damping, i.e. xi p >1 is->
Due toIf c 2 =1 and η 3 ∈[0,1]C is 2 η 3 ∈[0,1]At this time->The probability of (2) is low, i.e. only when eta 3 Over-damping will occur at smaller values. Accordingly, even if the agent wishes to move the particles in a damped manner, it is likely that +.>And the decision action a of the agent is disabled.
In order to improve the effectiveness of the decision of the intelligent agent, the invention controls the occurrence probability of the over damping to be about 10 percent, and further improves the setting of the particle swarm optimization method: taking c 1 =1.5,c 2 =0.5,η 1,2 ∈[0.5,1],η 3 ∈[0,1]At this time c 2 η 3 ∈[0,0.5]The probability of the system becoming overdamped increases.
As shown in FIG. 3, the method has pertinence in improving the method by analyzing the relation between the parameter setting and the particle evolution of the particle swarm optimization method.
If the second-order control system meets the convergence state when the steady-state error reaches 2%, the convergence time t of the particles system The method comprises the following steps:
wherein t is system Is the particle convergence time. As can be seen from the formulae (3) and (8), if w is increased p The convergence time and system frequency increase and the system damping decreases. Formula (8) does not consider ζ p <0. Theoretically t when the system diverges system Toward infinity, but considering that the particles eventually return to a converging state, it is impossible to permanently diverge, and the divergence is considered to extend the convergence time only, set to be pp The system diverges, so that the convergence time when the system diverges can be obtained by rewriting (8):
wherein ln represents natural logarithm, ε p Is a set threshold value;
when epsilon p <ξ p When less than 1, the second-order control system is in an underdamped state, and when xi p When the damping is not less than 1, the second-order control system is in an over-damping state, and when the damping is not less than ζ p ≤ε p The time second order control system is in a divergent stateA state;
define agent state s as:
wherein G is best P derived for current evolutionary algebra g Corresponding performance index, G best P evolved for the previous generation g Corresponding performance index, G i P, the current algebra of evolution i The corresponding performance metrics, avg () represents the calculated average, δ swarm Is population diversity.
The performance index can be set according to specific problems and requirements, and the performance index is set only by meeting the conditions of minimization and dimensionless, for example, the performance index can be set to have minimum heat absorption, maximum range, minimum load and the like according to specific problems and requirements.
Other steps and parameters are the same as in the first or second embodiment.
The specific embodiment IV is as follows: this embodiment will be described with reference to fig. 2. This embodiment differs from one to three embodiments in that the inertial weight l p The value range of (1) is (0, 1)]。
Other steps and parameters are the same as in one to three embodiments.
Fifth embodiment: this embodiment differs from the embodiments by one to four in that the threshold ε p The value of (2) is 0.02.
Other steps and parameters are the same as in one to four embodiments.
Specific embodiment six: the difference between this embodiment and one to fifth embodiments is that the method for calculating the population diversity is as follows:
wherein,represents the t p At generation 1 evolution, all particles search for the average position of the j-th optimization parameter in the search space.
Other steps and parameters are the same as in one of the first to fifth embodiments.
Seventh embodiment: this embodiment differs from one of the first to sixth embodiments in that the convergence state of the particles is:
when the particle swarm optimization method converges, there are:
wherein E represents a mathematical expectation, p i,j =p,p g,j =g。
Other steps and parameters are the same as in one of the first to fifth embodiments.
Eighth embodiment: this embodiment differs from one of the first to seventh embodiments in that the step 25 specifically includes:
defining an action a of the agent:
a=[l p ,D w ] T (13)
wherein, the upper corner mark T represents transposition, D w =±1;
When D is w When=1, the agent makes particle dispersion decisions and makes the particle dispersion decisions based on ζ p ≤ε p Giving an inertial weight l p Is a value of (2);
when D is w When the energy value is = -1, the intelligent agent makes particle convergence decision and makes particle convergence decision according to ζ p >ε p Giving an inertial weight l p Is used as a reference to the value of (a),
other steps and parameters are the same as those of one of the first to seventh embodiments.
Detailed description nine: this embodiment differs from one to eight embodiments in that the specific process of step 27 is:
wherein r represents the rewarding value of the agent, G best Representing p after updating the particle positions in the particle swarm g Corresponding performance indexes.
Other steps and parameters are the same as in one to eight of the embodiments.
Equation (14) shows that when the performance index is improved, i.e., the particle swarm finds a better position, the agent obtains a prize value of 10 in this step, otherwise, no prize is obtained. The goal of designing reinforcement learning-particle swarm hybrid optimization methods is: achieving D-pair with least particle count and evolutionary number in particle swarm optimization pso Optimizing parameters, thus taking a fixed number of steps per round of travel of the agent, i.e. when t p =t p,max The agent ends the round.
Detailed description ten: this embodiment is different from one of the first to ninth embodiments in that the specific process of the step 3 is:
step 31, initializing the position and the speed of the particles, and updating the position of the particles according to the initialized position and speed of the particles (i.e. according to formula (1));
step 32, updating control variables of the sub-orbital hypersonic carrier according to the updated particle position, wherein the control variables comprise attack angle and roll angle;
step 33, calculating the state of the intelligent agent according to the updated particle position;
step 34, inputting the state of the intelligent agent calculated in the step 33 to the trained intelligent agent, and outputting actions by the intelligent agent;
step 35, according to the inertia weight l in the output action of the agent p Updating the position of the particles;
step 36, cycling the steps 32 to 35 until the sub-orbital hypersonic carrier reaches the target point predetermined by the task.
According to the method, the control variable of the sub-orbit hypersonic carrier is updated according to the updated particle position, so that the sub-orbit hypersonic carrier is controlled.
Other steps and parameters are the same as in one of the first to ninth embodiments.
The above examples of the present invention are only for describing the calculation model and calculation flow of the present invention in detail, and are not limiting of the embodiments of the present invention. Other variations and modifications of the above description will be apparent to those of ordinary skill in the art, and it is not intended to be exhaustive of all embodiments, all of which are within the scope of the invention.

Claims (9)

1. The method for optimizing the online track of the sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization is characterized by comprising the following steps of:
step 1, setting a learning factor c 1 And c 2 Setting a random number eta 1 、η 2 And eta 3 Is a value of (2);
setting the upper limit of training rounds of the reinforcement learning intelligent agent as M, and setting the upper limit of the particle swarm evolution algebra in a single round as M;
step 2, training the reinforcement learning agent;
step 21, initializing the position and speed of each particle in the particle swarm, and updating the position of the particle according to the initialized position and speed of the particle;
step 22, setting the round number to be 1;
step 23, let evolution algebra p=1;
step 24, calculating the state of the intelligent agent according to the position of each particle in the particle swarm;
step 25, randomly exploring actions of the intelligent agent;
step 26, utilizing the inertial weight l in the action of the agent p Updating the position of each particle in the particle swarm;
step 27, calculating an agent reward according to the particle positions in the updated particle swarm; the evolution algebra p is increased by 1, and whether the evolution algebra is smaller than or equal to m is judged;
if yes, return to execute step 24;
otherwise, executing step 28;
step 28, the number of rounds is increased by 1, and whether the number of rounds is smaller than or equal to M is judged;
if yes, returning to the execution step 23;
if not, ending training;
step 3, online optimizing the track of the sub-track hypersonic carrier by using the trained reinforcement learning agent;
the specific process of the step 3 is as follows:
step 31, initializing the position and speed of the particles, and updating the position of the particles according to the initialized position and speed of the particles;
step 32, updating control variables of the sub-orbital hypersonic carrier according to the updated particle position, wherein the control variables comprise attack angle and roll angle;
step 33, calculating the state of the intelligent agent according to the updated particle position;
step 34, inputting the state of the intelligent agent calculated in the step 33 to the trained intelligent agent, and outputting actions by the intelligent agent;
step 35, according to the inertia weight l in the output action of the agent p Updating the position of the particles;
step 36, cycling the steps 32 to 35 until the sub-orbital hypersonic carrier reaches the target point predetermined by the task.
2. The method for optimizing the online trajectory of a sub-orbital hypersonic vehicle based on reinforcement learning-particle swarm hybrid optimization according to claim 1, wherein the learning factor c 1 =1.5, learning factor c 2 =0.5, random number η 1 ∈[0.5,1]Random number eta 2 ∈[0.5,1]Random number eta 3 ∈[0,1],M=2000,m=30。
3. The method for optimizing the online trajectory of the sub-orbital hypersonic vehicle based on reinforcement learning-particle swarm hybrid optimization according to claim 2, wherein the specific process of step 24 is as follows:
by M pso Particle pair D pso Searching by optimizing parameters, wherein the position vector of the ith particle in the search space isx i,1 ,x i,2 ,...,/>Representing the ith particle pair 1 st, 2 nd, …, D in the search space pso Search results of the optimization parameters, the velocity vector of the ith particle in the search space isThe upper corner mark T represents the transposition, v i,1 ,v i,2 ,...,/>Representing the ith particle pair 1 st, 2 nd, …, D in the search space pso The speed of forward movement when searching for the optimum parameters i=1, 2, …, M pso
The historical best position vector for all particle findings in the whole particle population isWherein p is g,1 Representing the search result of the historical best position vector for the 1 st optimization parameter, p g,2 Representing the search result for the 2 nd optimization parameter in the historical best position vector,/for>Represents the D-th of the historical best position vectors pso Search results of the optimization parameters, the historical best position vector of the i-th particle discovery is +.>Wherein p is i,1 Search results for the 1 st optimization parameter in the historical best position vector representing the i-th particle discovery, p i,2 Search results for the 2 nd optimization parameter in the historical best position vector representing the i-th particle discovery, are->The historical best position vector representing the i-th particle discovery is for the D-th pso Search results of the individual optimization parameters;
all particles evolved in the manner of formula (1):
wherein t is p Is the current evolution algebra;represents the t p At evolution, the speed of forward movement of the ith particle in search space searching for the jth optimization parameter,/->Represents the t p At generation-1 evolution, the speed of forward movement of the ith particle in the search space when searching for the jth optimization parameter, l p Is inertial weight, p i,j Is the historical best position of the jth optimization parameter found by the ith particle, p g,j Is the historical best position of the j-th optimization parameter found by all particles, < >>Represents the t p When the generation evolves, the ith particle is searching for the empty spaceSearch results in the middle for the j-th optimized parameter position,/->Represents the t p -upon evolution of generation 1, search results of the ith particle for the jth optimized parameter position in the search space;
introducing a random number eta 3 Formula (1) is rewritten as formula (2):
wherein,represents the t p When +1 generation evolves, the ith particle searches the search result of the jth optimizing parameter position in the search space;
obtaining a second order control system according to equation (2):
wherein omega p Is the frequency of the second-order control system, xi p Damping for a second order control system;
convergence time t of the particles system The method comprises the following steps:
wherein ln represents natural logarithm, ε p Is a set threshold value;
define agent state s as:
wherein G is best P derived for current evolutionary algebra g Corresponding performance index, G' best P evolved for the previous generation g Corresponding performance index, G i P, the current algebra of evolution i The corresponding performance metrics, avg () represents the calculated average, δ swarm Is population diversity.
4. The method for optimizing online trajectories of a sub-orbital hypersonic vehicle based on reinforcement learning-particle swarm hybrid optimization according to claim 3, wherein the inertial weight l p The value range of (1) is (0, 1)]。
5. The method for optimizing online trajectories of a sub-orbital hypersonic vehicle based on reinforcement learning-particle swarm hybrid optimization of claim 4, wherein the threshold ε p The value of (2) is 0.02.
6. The method for optimizing the online trajectory of the sub-orbital hypersonic vehicle based on reinforcement learning-particle swarm hybrid optimization according to claim 5, wherein the method for calculating the swarm diversity is as follows:
wherein,represents the t p At generation 1 evolution, all particles search for the average position of the j-th optimization parameter in the search space.
7. The method for optimizing the online trajectory of a sub-orbital hypersonic vehicle based on reinforcement learning-particle swarm hybrid optimization according to claim 6, wherein the convergence state of the particles is:
wherein E represents a mathematical expectation, p i,j =p,p g,j =g。
8. The method for optimizing the online trajectory of a sub-orbital hypersonic vehicle based on reinforcement learning-particle swarm hybrid optimization according to claim 7, wherein the step 25 is specifically:
defining an action a of the agent:
a=[l p ,D w ] T (13)
wherein, the upper corner mark T represents transposition, D w =±1;
When D is w When=1, the agent makes particle dispersion decisions and makes the particle dispersion decisions based on ζ p ≤ε p Giving an inertial weight l p Is a value of (2);
when D is w When the energy value is = -1, the intelligent agent makes particle convergence decision and makes particle convergence decision according to ζ p >ε p Giving an inertial weight l p Is used as a reference to the value of (a),
9. the method for optimizing the online trajectory of the sub-orbital hypersonic vehicle based on reinforcement learning-particle swarm hybrid optimization according to claim 8, wherein the specific process of step 27 is as follows:
wherein r represents the rewarding value of the agent, G best Representing p after updating the particle positions in the particle swarm g Corresponding performance indexes.
CN202310946041.9A 2023-07-28 2023-07-28 On-line track optimization method for sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization Active CN116956987B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310946041.9A CN116956987B (en) 2023-07-28 2023-07-28 On-line track optimization method for sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310946041.9A CN116956987B (en) 2023-07-28 2023-07-28 On-line track optimization method for sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization

Publications (2)

Publication Number Publication Date
CN116956987A CN116956987A (en) 2023-10-27
CN116956987B true CN116956987B (en) 2024-03-26

Family

ID=88446069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310946041.9A Active CN116956987B (en) 2023-07-28 2023-07-28 On-line track optimization method for sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization

Country Status (1)

Country Link
CN (1) CN116956987B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398834A (en) * 2022-01-18 2022-04-26 中国科学院半导体研究所 Training method of particle swarm optimization algorithm model, particle swarm optimization method and device
CN114444648A (en) * 2022-04-08 2022-05-06 中国人民解放军96901部队 Intelligent optimization method based on reinforcement learning and particle swarm optimization
CN116451737A (en) * 2023-04-25 2023-07-18 上海电力大学 PG-W-PSO method for improving particle swarm based on reinforcement learning strategy gradient

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114413906B (en) * 2022-01-18 2022-12-13 哈尔滨工业大学 Three-dimensional trajectory planning method based on improved particle swarm optimization algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398834A (en) * 2022-01-18 2022-04-26 中国科学院半导体研究所 Training method of particle swarm optimization algorithm model, particle swarm optimization method and device
CN114444648A (en) * 2022-04-08 2022-05-06 中国人民解放军96901部队 Intelligent optimization method based on reinforcement learning and particle swarm optimization
CN116451737A (en) * 2023-04-25 2023-07-18 上海电力大学 PG-W-PSO method for improving particle swarm based on reinforcement learning strategy gradient

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于改进粒子群算法的飞行器协同轨迹规划;周宏宇 等;自动化学报;20221130;第48卷(第11期);第2670-2676页 *

Also Published As

Publication number Publication date
CN116956987A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN111090941A (en) Spacecraft optimal Lambert orbit rendezvous method based on multi-objective optimization algorithm
Ma et al. Multi-robot target encirclement control with collision avoidance via deep reinforcement learning
CN113093802A (en) Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning
CN113962012B (en) Unmanned aerial vehicle countermeasure strategy optimization method and device
Li et al. Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning
CN116501086B (en) Aircraft autonomous avoidance decision method based on reinforcement learning
CN116736883B (en) Unmanned aerial vehicle cluster intelligent cooperative motion planning method
Xianyong et al. Research on maneuvering decision algorithm based on improved deep deterministic policy gradient
CN114859899A (en) Actor-critic stability reinforcement learning method for navigation obstacle avoidance of mobile robot
Cao et al. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory
Cheng et al. Improved GASA algorithm for mutation strategy UAV path planning
CN113962013B (en) Aircraft countermeasure decision making method and device
Liang et al. Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN–LSTM fusion network
Tan et al. A robust multiple Unmanned Aerial Vehicles 3D path planning strategy via improved particle swarm optimization
CN116956987B (en) On-line track optimization method for sub-track hypersonic carrier based on reinforcement learning-particle swarm hybrid optimization
Kang et al. Air-to-air combat tactical decision method based on SIRMs fuzzy logic and improved genetic algorithm
Huang et al. An Improved Q-Learning Algorithm for Path Planning
Lei et al. Modified Kalman particle swarm optimization: Application for trim problem of very flexible aircraft
CN116796843A (en) Unmanned aerial vehicle many-to-many chase game method based on PSO-M3DDPG
Qingtian Research on cooperate search path planning of multiple UAVs using Dubins curve
Ma et al. Strategy generation based on reinforcement learning with deep deterministic policy gradient for UCAV
Salem et al. Investigation of various optimization algorithms in tuning fuzzy logic-based trajectory tracking control of quadcopter
Yang et al. Decomposed and Prioritized Experience Replay-based MADDPG Algorithm for Multi-UAV Confrontation
Showalter et al. Objective comparison and selection in mono-and multi-objective evolutionary neurocontrollers
Wu et al. Unmanned Aerial Vehicle Path Planning Based on DP-DDPG Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant