CN115204062A

CN115204062A - Reinforced hybrid differential evolution method and system for interplanetary exploration orbit design

Info

Publication number: CN115204062A
Application number: CN202211118194.6A
Authority: CN
Inventors: 彭雷; 袁卓铭; 戴光明; 王茂才; 宋志明; 陈晓宇
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2022-10-18
Anticipated expiration: 2042-09-15
Also published as: CN115204062B

Abstract

The invention discloses a reinforced hybrid differential evolution method and a system for interplanetary exploration orbit design, wherein the method comprises the following steps: (1) RL _ HDE uses Q-Learning algorithm to adaptively control six different variation strategies, and enhances the algorithm optimizing ability. Aiming at the self-adaptive control of six different variation strategies, the global operator uses an LSHADE _ EIG method, the method improves an international evolution computing competition (CEC 2015) algorithm LSHADE _ SPS _ EIG, and an SPS frame is not used; (2) Adaptive control of trigger parameters using reinforcement Learning Q-Learning algorithmρ _1,max Andρ _2,max balancing algorithm exploration and development capabilities. The invention has the beneficial effects that: can haveThe solving speed of the optimization design of the interplanetary detection orbit is effectively improved, and the calculation precision of the detector orbit is improved.

Description

Reinforced hybrid differential evolution method and system for interplanetary exploration orbit design

Technical Field

The invention relates to the field of interplanetary orbit detection, in particular to an enhanced hybrid differential evolution method and system for interplanetary detection orbit design.

Background

The design and optimization of the interplanetary exploration orbit are one of the key engineering problems of the deep space exploration system, and as the deep space exploration needs to consider a plurality of complex factors, when interplanetary exploration (particularly minor planet exploration tasks) needs to be selected from thousands of alternative stars, the problem solving scale is rapidly enlarged, and a search space often has the characteristics of large parameter space, high nonlinearity, associated extreme points, sensitive globally optimal solution and attraction basins and the like, so that the interplanetary orbit design is difficult. The existing deep space track optimization design method has the following defects:

(1) The method has insufficient universality. Can only solve the problem that the characteristics of one or more problems are consistent.

(2) Due to the characteristics of high nonlinearity, extreme point concomitance, sensitivity of the global optimal solution attraction basin and the like, the algorithm is difficult to find the optimal feasible solution, the search performance is unstable, and the robustness is poor. The existing optimization method is not enough to study on how to design the method by using the implicit knowledge of the deep space orbit data and the analytic knowledge of the problems.

(3) The time consumption is large. The current MIDACO algorithm with higher universality needs to depend on super-computing equipment. Even so, it still takes a high time consumption (days to weeks) to find a good solution.

Disclosure of Invention

In order to solve the technical problems, the invention provides an enhanced hybrid differential evolution method and system (English name: RL _ HDE) for interplanetary detection track design, which can effectively improve the solving speed of interplanetary detection track optimization design, improve the calculation precision of a detector track, and provide a new solution for the track design of remote star detection such as wooden stars, earth stars, asteroids and the like in China.

The method comprises the following steps:

s1, determining the design problem of the deep space track of the detector to be solvedM；

S2, problem of constructionMIs an objective function off(x) And a decision vectorxGlobal search area upper boundaryx ^ub Lower boundaryx ^lb ；

S3, initializing parameters for Q-learning: learning rateαDiscount factorγ；

Control parameters for initializing CMA-ES local search area boundariesBound _init AndBound _min ；

initializing the global operator LSHADE _ EIG highest dead algebraρ _1,max And current stagnation algebraρ ₁ ；

Initializing local operator CMA-ES maximum stagnation timesρ _2,max Current stagnation algebraρ ₂ ；

Initializing scale factor parameters of interior point methodls_eval；

Initializing the maximum number of solution to the objective functionMAX_FESAnd current number of solutionsFES；

The global operator LSHADE _ EIG is used for carrying out preliminary exploration on the whole search space to obtain a preliminary global optimal solution;

stagnant algebraρ ₁ The system is used for recording the accumulated stagnation times when the global operator LSHADE _ EIG is solved;

the local operator is used for further searching and calculating in the preliminary solution space to accelerate the objective functionf(x) The solving process of (2);

stagnant algebraρ ₂ The method is used for recording the accumulated stagnation times when the solving of the local operator CMA-ES is finished;

separately initializing the parameters for adaptive control mutation strategiesρ _1,max Andρ _2,max Q-Table of (1). Wherein, each individual in the LSHADE _ EIG operator initiates a Q-Table to adaptively control the selection of mutation strategy.

S4, adoptSelf-adaptive updating trigger parameter of Q-Learning algorithmρ _1,max ；

S5, judgingρ ₁ Whether or not less thanρ _1,max And is andFESwhether or not less thanMAX_FESIf yes, entering step S6; otherwise, the step S10 is carried out, the solution of the global search space is finished, and the adaptive control trigger parameters are updatedρ _max1, Starting local solving by the Q-Table matrix;

s6, adopting a Q-Learning algorithm to adaptively select a mutation strategy;

s7, starting a global operator LSHADE _ EIG, and starting to perform preliminary exploration solving on the whole search space;

s8, updating a Q-Table matrix of the adaptive control mutation strategy;

s9, judging

Is established, wherein

The optimal solution obtained for lshand _ EIG,x ^gmin is a global optimal solution;

if true, the number of stagnating algebrasρ ₁ Set to zero, willx ^gmin Instead of using

(ii) a Otherwise stagnation algebraρ ₁ Self-adding 1.ρ ₁ Returning to the step S5 after updating;

s10, according to

And control parametersBound _init ，Bound _min Determining a local search space;

s11, in the local search space, self-adaptively updating the trigger parameters by adopting a Q-Learning algorithmρ _2,max ；

S12, judgingρ ₂ Whether or not less thanρ _2,max And is andFESwhether or not less thanMAX_FESIf yes, go to step S13; otherwise, updating adaptive control trigger parametersρ _2,max The Q-Table matrix enters the step S15 to represent the end of the CMA-ES local search solution;

s13, starting a local operator CMA-ES, and starting to solve a local search space;

s14, judgment

Whether or not the above-mentioned conditions are satisfied,

the optimal solution obtained for the CMA-ES,x ^gmin is a global optimal solution;

if so, it will stall algebraρ ₂ Is set to zero, willx ^gmin Instead of using

(ii) a Otherwise stagnation algebraρ ₂ And (4) adding 1 by itself.ρ ₂ Returning to the step S12 after updating;

s15, judging the current solving timesFESIs less than 0.75MAX_FESIf so, the process returns to step S4. If the current number of solving timesFESIs no longer less thanMAX_FESThen, the process proceeds to step S16. If the current number of solution timesFESWhether it is greater than 0.75MAX_FESAnd is less thanMAX_FESAnd updating the global optimal solution by using a local operator inner point method. Judgment of

If it is, it willx ^gmin Is replaced by

，

The optimal solution is obtained by the interior point method. And finally, updating the local operator inner point method parameters, and entering the step S16.

S16, judging the current solving timesFESWhether or not it is greater than or equal toMAX_FESIf yes, the solution is finished, and the current situation isx ^gmin Solving the result for the final; if not, the process returns to step S4.

The system comprises:

the deep space track design problem construction module comprises:

determining the design problem of the deep space track of the detector to be solvedM(ii) a Build problemsMIs an objective function off(x) And a decision vectorxGlobal search area upper boundaryx ^ub Lower boundary, lower boundaryx ^lb ；

The deep space track design problem parameter initialization module:

initialize parameters for Q-learning: learning rateαDiscount factorγ；

initializing global operator LSHADE _ EIG highest-standing algebraρ _1,max And current stagnation algebraρ ₁ ；

Initializing local operator CMA-ES maximum stagnation timesρ _2,max Current number of stalled algebrasρ ₂ ；

Scale factor parameter for initializing interior point methodls_eval；

separately initializing the parameters for adaptive control mutation strategiesρ _1,max Andρ _2,max Q-Table of (1); each individual in the LSHADE _ EIG operator initializes a Q-Table to adaptively control the selection of a mutation strategy;

the global solving module of the deep space track design problem comprises:

self-adaptive updating trigger parameter by adopting Q-Learning algorithmρ _1,max ；

Judgment ofρ ₁ Whether or not less thanρ _1,max And is made ofFESWhether or not less thanMAX_FESIf yes, a self-adaptive selection mutation strategy adopting a Q-Learning algorithm is adopted, and a global operator LSHADE _ EIG is started; otherwise, updating adaptive control trigger parametersρ _max1, The Q-Table matrix enters a deep space track design problem local solving module;

starting the global operator LSHADE _ EIG, and initially exploring and solving the whole search space;

updating a Q-Table matrix of the adaptive control mutation strategy;

judgment of

Whether or not, wherein

if true, the number of stagnating algebrasρ ₁ Is set to zero, willx ^gmin Is replaced by

(ii) a Otherwise, the stagnating algebraρ ₁ Self-adding 1;ρ ₁ returning to a deep space track design problem global solving module after updating;

the local solving module of the deep space track design problem comprises:

according to

in the local search space, trigger parameters are updated adaptively by adopting Q-Learning algorithmρ _2,max ；

Judgment ofρ ₂ Whether or not less thanρ _2,max And is made ofFESWhether or not less thanMAX_FESIf yes, starting a local operator CMA-ES, and starting to solve the local search space; otherwise, updating adaptive control trigger parametersρ _max2, The Q-Table matrix enters a deep space track design problem convergence solving module;

judgment of

Whether or not the above-mentioned conditions are satisfied,

the optimal solution obtained for the CMA-ES,x ^gmin is a global optimal solution; if so, it will stall algebraρ ₂ Is set to zero, willx ^gmin Is replaced by

(ii) a Otherwise, the stagnating algebraρ ₂ Self-adding 1;ρ ₂ returning to a local solving module of the deep space track design problem after updating;

the deep space orbit design problem convergence solving module comprises:

the first solving section: judging the current solving timesFESWhether it is less than 0.75MAX_FESIf yes, returning to a global solution module of the deep space track design problem; if the current number of solving timesFESIs no longer less thanMAX_FESThen enter the second solution portion; if the current number of solving timesFESIs greater than 0.75MAX_FESAnd is smaller thanMAX_FESUpdating the global optimal solution by using a local operator inner point method; judgment of

If it is, it willx ^gmin Is replaced by

，

Obtaining an optimal solution for an interior point method; finally, updating local operator interior point method parameters, and entering a second solving part;

the second solving part: judging the current solving timesFESWhether or not it is greater than or equal toMAX_FESIf yes, the solution is finished, and the current situation isx ^gmin The final solution result is obtained; if not, returning to the global solution module of the deep space track design problem.

The beneficial effects provided by the invention are as follows: the solving speed of the interplanetary detection track optimization design can be effectively improved, and the calculation precision of the detector track is improved.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2, FIG. 3, and FIG. 4 are diagrams illustrating adaptive Q-learning control mutation strategies and triggering parameters according to the present inventionρ _max1, And triggering parametersρ _max2, A detailed flow chart of the method;

FIG. 5 isρ _max1, Q-Table matrix ofQ _DE A divided schematic;

FIG. 6 is a diagram illustrating a Q-Table matrix of an adaptive control mutation strategy after being partitioned;

FIG. 7 isρ _max2, Q-Table matrix ofQ _CMA Divided schematic.

Detailed Description

To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.

Before describing the present application in detail, some basic concepts that will be mentioned later are introduced in advance.

(1) Global operator LSHADE _ EIG highest-standing algebraρ _1,max And current stall algebraρ ₁ (ii) a The global operator LSHADE _ EIG is used for carrying out preliminary exploration on the whole search space to obtain a preliminary solution space;

(2) Local operator CMA-ES maximum number of stallsρ _2,max Current number of stalled algebrasρ ₂ ；

The local operator is used for further searching and calculating in the preliminary solution space to accelerate the objective functionf(x) The solving process of (2); stagnant algebraρ ₂ The method is used for recording the accumulated stagnation times when the solving of the local operator CMA-ES is finished;

(3) Maximum number of solution to the objective functionMAX_FESAnd current number of solutionsFES；

The maximum solving times of the objective function controls the maximum solving times of the whole solving process, so that the solving process can be smoothly ended; current number of solutionFESAs the name implies, the number of times the objective function is currently solved is recorded.

(4) Q-Learning algorithm and Q-Table matrix

Q-Learning is an off-line Learning algorithm that does not require modeling of the environment, and is widely used due to its simple application, fast convergence speed, low computation cost, and the like. The main idea of Q-Learning is based on instant rewardsrAnd the current Q-Value (Value in Q-Table matrix), evaluating the next statesTaking actionaThe processes are all calculated by Q-Table iteration (see the following Table), and a state set is setS={s ₁ ,s ₂ ,...,s _i }, action setA={a ₁ ,a ₂ ,...,a _j }, agent consensusiIn the case of a seed-state,jan action is performed. WhereinQ(s _t ,a _t ) Is shown at the presents _t In the state, selecting actiona _t The future fatigue caused byAnd (7) calculating the income. During each interaction, the agent is assigned a states _t And selecting the best action to performa _t After the action has taken place, the environment gives a rewardr _t+1 The agent may also transition to a new states _t+1 In this iteration, the agent forms the expected value for each given action by learning past experienceQ(s _t ,a _t ) And (4) evaluating.

Referring to fig. 1, fig. 1 is a simplified flow chart of the method of the present invention.

The method mainly adopts a Q-Learning-based mixed differential evolution algorithm to solve the track design problem of the deep space probe; for solving the problem, initializing corresponding population parameters, performing iterative evolution solving by using a global operator LSHADE _ EIG, and obtaining a local search space in a global search space; further, in the local search space, carrying out iterative evolution solving by using a local operator CMA _ ES to obtain a more accurate solving space; and finally, updating the optimal solution by adopting an interior point method within the accurate solution space range until the iteration condition or the maximum solution times of the objective function are met, and completing the objective function solution process.

Please refer to fig. 2, fig. 3 and fig. 4.

Fig. 2, fig. 3 and fig. 4 are detailed flowcharts of the method of the present invention. The invention provides a reinforced hybrid differential evolution method for interplanetary exploration orbit design, which specifically comprises the following steps of:

It should be noted that the deep space track design problemMCan be as follows: calculating the accumulated change speed of the detector during the deep space detection taskΔVOr cumulative energy change, etc., and the present application is not intended to limit the specific problems, but only to schematically illustrate them.

S2, problem of constructionMIs an objective function off(x) And a decision vectorxGlobal search area upper boundaryx ^ub Lower boundaryx ^lb (ii) a It should be noted that the decision vectorxHas a dimension ofD；

S3, initializing parameters for Q-learning: learning rateαDiscount factorγ(ii) a As an example, the learning rate in this applicationαInitialized to 0.1, discount factorγInitialization was 0.9;

control parameters for initializing CMA-ES local search area boundariesBound _init AndBound _min (ii) a As an example, the control parameter is used in the present applicationBound _init The initial value is set to be 0.5,Bound _mint initialization is 0.1;

initializing global operator LSHADE _ EIG highest-standing algebraρ _1,max And current stall algebraρ ₁ (ii) a As an example, the highest dead algebra in this applicationρ _1,max Initialized to 20, current dead algebraρ ₁ Initializing to 0;

initializing the maximum stagnation times of the local operators CMA-ESρ _2,max Current stagnation algebraρ ₂ ；

As an example, the highest dead algebra in this applicationρ _2,max Initialization to 10, current stall algebraρ ₂ Initialization is 0;

initializing scale factor parameters of interior point methodls_eval(ii) a As an example, the scale factor is used in this applicationls_ evalInitialization is 0.01;

The global operator LSHADE _ EIG is used for carrying out preliminary exploration on the whole search space to obtain a preliminary solution space;

stagnant algebraρ ₂ For recording partsThe accumulated stagnation times when the solving of the operator CMA-ES is finished;

s4, updating the trigger parameters in a self-adaptive mode by adopting a Q-Learning algorithmρ _1,max ；

It should be noted that, initializing Q-Learning for adaptive control of trigger parametersρ _1,max Q-Table matrix ofQ _DE ，Q _DE According to the first parameterSc _DE1 And a first parameterf _DE1 Dividing the states, combining to obtain six population evolution states, and including seven preset first action update values; wherein the population evolution state is used as a matrixQ _DE The first action update value as a matrixQ _DE A column of (1); first parameterSc _DE1 Comprises three states; first parameterf _DE1 Including two states.

Referring to fig. 5, fig. 5 is a schematic diagram illustrating a state after a matrix is divided; wherein the six combination states are defined by a first parameterSc _DE1 State and first parameter off _DE1 The states of (a) are combined;

first parameterSc _DE1 And a first parameterf _DE1 Satisfies the following formula:

（4.1）

（4.2）

wherein,X _DE for the final population obtained after the LSHADE _ EIG operator evolution,X ₀ the initial population that begins to evolve for the LSHADE _ EIG operator,diversitythe function is used to evaluate population diversity of the input population,avg_fitnessthe function is used for calculating the average fitness of individuals in the input population;

(4.3)

(4.4)

Lfor searching spaceS∈R ^D The length of the diagonal line of (a),NPin order to be of the population scale,f(x _i ) Is an individualx _i The value of the corresponding objective function is determined,

is the first of all individuals in the populationjThe average value of the dimensional variables is,x _j,i is the first in the populationiThe first of an individualjA dimension variable value;

the seven preset first action update values are respectively-5, -3, -1,0,1,3 and 5;

s5, judgingρ ₁ Whether or not less thanρ _1,max And is andFESwhether or not less thanMAX_FESIf yes, entering step S6; otherwise, the step S10 is carried out, which indicates that the global search space solution is finished, and the self-adaptive control trigger parameter is updatedρ _max1, Starting local solving by the Q-Table matrix;

note that in S4ρ _1,max The main process of the self-adaptive value taking is as follows:

and calculating the probability of each action in the preset first action selected under the current state according to a formula (5.1), and randomly selecting a certain action to execute according to the probability of each action.

Whereinp(s _i ,a _j ) Is in a states _i Lower selection actiona _j The probability of (a) of (b) being,Q(s _i ,a _j ) Being in Q-Table states _i Lower selection actiona _j The corresponding Q value is set according to the current value,nthe number of types of operation.

（5.1）

（5.2）

Wherein,ε ₁ one of the seven first actions is updated.

If it isρ ₁ Greater than or equal toρ _1,max Then the global search is over and the prize is distributed according to equation (5.3) at which timex ^gmin Is a global optimal solution;

（5.3）

then updated according to the formula (8.1)Q _DE ；

S6, adopting a Q-Learning algorithm to adaptively select a mutation strategy;

it should be noted that the Q-Table matrix of the adaptive control mutation strategy isQ _strategy ，Q _strategy According to the second parameterSc _DE2 And a second parameterf _DE2 Dividing the states, combining to obtain twenty population evolution states and including six variation strategies; wherein the population evolution state is used as a matrixQ _strategy As a matrix, a mutation strategyQ _strategy A column of (1); second parameterSc _DE2 Comprises five states; second parameterf _DE2 Including four states. The six variation strategies are shown in table 1.

TABLE 1 six different mutation strategies

Whereini≠r ₁ ≠r ₂ ≠r ₃ ≠r ₄ And is andx _{r G，1,} x _{r G，3,} x _{r G4,} is an individual randomly selected from a population of individuals,x _{r2 G,} are randomly selected individuals from the population and external archive. And the external archive is used for protecting the diversity of the population and storing the failed parent vectors in the selection process.x _best,G Is the best individual in the population,x _pbest,G is the top rank in the populationpOf (a).

Referring to fig. 6, fig. 6 is a schematic diagram of a Q-Table matrix of an adaptive mutation control strategy after being divided. The step is based on the second parameterSc _DE2 State and second parameter off _DE2 The value of (a) divides the population into twenty states.

Second parameterSc _DE2 And a second parameterf _DE2 Satisfies the following formula:

（6.1）

（6.2）

wherein,

（6.3）

（6.4）

the definitions of the relevant parameters in the formula are equivalent to those described above,X _G evolution process for LSHADE _ EIG operatorGGeneration group;

at the upper boundaryx ^ub And a lower boundaryx ^lb In-range random generationNPA subject (A)MSolution vector of) of the same, the individualsx _i Together form a first generation populationX ₀ . Population initialization satisfies equation (7.1):

(7.1)

rand _i,j (0, 1) is a value in the range of [0,1 ]]A randomly distributed variable of (a);

calculating the probability of selecting each mutation strategy in the current state according to a formula (5.1), selecting one mutation strategy according to the probability of each mutation strategy for execution, and performing mutation operation on the population;

after mutation is finished, performing cross operation on the population by using an EIG cross operator to generate cross individuals;

the resulting cross individuals are selected and prize distribution is performed according to equation (7.2) whereu _i Is an individualx _i And obtaining the individual after mutation and crossing.

(7.2)

S8, updating a Q-Table matrix of an adaptive control variation strategy;

updating according to equation (8.1)Q _strategy In whichαIn order to obtain a learning rate,γis the discount rate.r _t+1 Performing actions for agentsa _t The reward that is obtained later is that the user can,s _t+1 is that the agent is in a states _t Performing an actiona _t Then, the state is shifted to the state of the next time,max _a Q(s _t+1 ,a) RepresentQ _strategy The middle state iss _t+1 Maximum of timeQA value;

(8.1)

s9, judging

Whether or not, wherein

if true, the number of stagnating algebrasρ ₁ Is set to zero, willx ^gmin Instead of using

(ii) a Otherwise, the stagnating algebraρ ₁ Self-adding 1.ρ ₁ Returning to the step S5 after updating;

updating according to equation (9.1)ρ ₁ ；

（9.1）

S10, according to

it should be noted that, the search space of CMA-ES is determined according to the formulas (10.1), (10.2) and (10.3);

(10.1)

(10.2)

(10.3)

wherein,x ^LSlb ,x ^LSub respectively represent the minimum boundary vector and the maximum boundary vector of the local search space of CMA-ES, respectivelyx ^lb ,x ^ub Then the minimum boundary vector and the maximum boundary vector of the global search space, respectively.

Optimal solution obtained for LSHADE _ EIG, whereinBound _init AndBound _min the control parameters for the local search space are initialized to 0.5 and 0.1, respectively.BoundIs a scale factor for controllingx ^LSlb Andx ^LSub the degree of scaling of (a).

S12, judgingρ ₂ Whether or not less thanρ _2,max And whether FES is less than MAX _ FES, if yes, go to step S13; otherwise, updating adaptive control trigger parametersρ _2,max The Q-Table matrix enters the step S15 to represent the end of the CMA-ES local search solution;

it should be noted that, in step S11, the probability of selecting various actions in the current state is calculated according to the formula (5.1): randomly selecting one action to be executed according to the probability of each action, and updating through a formula (12.1)ρ _max2, Whereinε ₂ Updating one of the values for the five second actions;

(12.1)

the parameters areρ _max2, Is Q-Table matrix ofQ _CMA ，Q _CMA According to the parametersSuc _CMA State and parameters ofRatio _CMA Is combined to obtainEight species group evolution states and five preset second action update values; wherein the population evolution state is used as a matrixQ _CMA The second action update value as a matrixQ _CMA A column of (1); parameter(s)Suc _CMA Comprises two states; parameter(s)Ratio _CMA Including four states.

Referring to FIG. 7, FIG. 7 is a drawingρ _max2, Q-Table matrix ofQ _CMA A divided schematic;

wherein the parametersSuc _CMA State and parameters ofRatio _CMA The calculation formula of (2) is shown in formulas (12.2) and (12.3). The second action update value is five for adaptive updateρ _max2, Values of-5, -3,0,1 and 2, respectively;

(12.2)

(12.3)

if it isρ ₂ Is greater than or equal toρ _2,max Then the CMA-ES local search is completed and the reward is distributed according to the formula (12.4), whereinx ^gmin Is a global optimal solution;

(12.4)

and updated according to the formula (8.1)Q _CMA ：

s14, judging

Whether or not the above-mentioned conditions are satisfied,

if so, it will stall algebraρ ₂ Is set to zero, willx ^gmin Is replaced by

(ii) a Otherwise, the stagnating algebraρ ₂ Self-adding 1.ρ ₂ Returning to the step S12 after updating;

the steps S12 to S14 are as follows:

initializing parameters, the initial value satisfying formula (14.1):

（14.1）

whereinE _n Is thatnThe order of the unit matrix is,nis the dimension of the problem;

using normal distributions

Generating new individuals, see formula (14.2):

（14.2）

sorting the individuals in the group according to the fitness and taking the first according to a formula (14.3)μOptimal individual update average vectorm. Wherein:

and is made of

Wherein

The calculation satisfies formula (14.4):

（14.3）

（14.4）

updating step size

Sum covariance matrixC _t+1 ;

Updating according to equation (14.5)ρ ₂ Wherein

The optimal solution obtained for the CMA-ES,x ^gmin then the solution is the global optimal solution;

（14.5）

whereinx ^gmin Is a global optimal solution;

s15, judging the current solving timesFESIs less than 0.75MAX_FESAnd if so, returning to the step S4. If the current number of solving timesFESIs no longer less thanMAX_FESThen, the process proceeds to step S16. If the current number of solving timesFESGreater than 0.75MAX_FESAnd is less thanMAX_FESThen, the global optimal solution is updated by using a local operator inner point method. Judgment of

If it is, it willx ^gmin Instead of using

，

The optimal solution is obtained by the interior point method. Finally updating the local operatorThe point method parameter enters step S16;

s16, judging the current solving timesFESWhether or not greater thanMAX_FESIf yes, the solution is finished, and the current situation isx ^gmin Solving the result for the final; if not, the process returns to step S4.

Regarding the local operator interior point method adopted in the steps S15 to S16, the following is specific:

taking an initial penalty factorμ ₀ Allowable errorε>0；

Get initial point of feasible fieldx ₀ ，k=1 whereinx ₀ Is the current global optimal solution;

constructing a penalty function

In the iteration ofkFromx _k-1 Starting point solutionx _k Point;

if the termination condition is satisfied, obtaining the optimal solutionx _k Otherwise, entering the next step;

；

updating interior point method parameters according to formula (16.1)ls_evalWhereinx ^gmin Is a global optimal solution;

（16.1）

when in use

When the solution is complete.

Based on the method, the invention provides a reinforced hybrid differential evolution system for interplanetary detection orbit design. The system comprises:

the deep space track design problem construction module comprises:

determining the design problem of the deep space track of the detector to be solvedM(ii) a Problem of constructionMIs an objective function off(x) And a decision vectorxGlobal search area upper boundaryx ^ub Lower boundary, lower boundaryx ^lb ；

The deep space track design problem parameter initialization module:

initializing parameters for Q-learning: learning rateαDiscount factorγ；

Initializing the maximum stagnation times of the local operators CMA-ESρ _2,max Current number of stalled algebrasρ ₂ ；

Initializing scale factor parameters of interior point methodls_eval；

the global solving module of the deep space track design problem comprises:

Judgment ofρ ₁ Whether or not less thanρ _1,max And is andFESwhether or not less thanMAX_FESIf yes, a self-adaptive selection mutation strategy adopting a Q-Learning algorithm is adopted, and a global operator LSHADE _ EIG is started; otherwise, updating adaptive control trigger parametersρ _max1, The Q-Table matrix enters a deep space track design problem local solving module;

updating a Q-Table matrix of an adaptive control variation strategy;

judgment of

Is established, wherein

if true, the number of generations will be stagnantρ ₁ Is set to zero, willx ^gmin Is replaced by

(ii) a Otherwise stagnation algebraρ ₁ Self-adding 1;ρ ₁ returning to a global solution module of the deep space track design problem after updating;

the local solving module of the deep space track design problem comprises:

according to

in a local search space, trigger parameters are updated adaptively by adopting a Q-Learning algorithmρ _2,max ；

judgment of

Whether or not the above-mentioned conditions are satisfied,

the optimal solution obtained for the CMA-ES,x ^gmin is a global optimal solution; if so, it will stall algebraρ ₂ Set to zero, willx ^gmin Is replaced by

(ii) a Otherwise stagnation algebraρ ₂ Self-adding 1;ρ ₂ returning to a deep space track design problem local solving module after updating;

the deep space track design problem convergence solving module comprises:

If it is, it willx ^gmin Is replaced by

，

Obtaining an optimal solution for an interior point method; finally updating local operator inner point normal parametersCounting, entering a second solving part;

the second solving part: judging the current solving timesFESWhether or not it is greater than or equal toMAX_FESIf yes, the solution is finished, and the current situation isx ^gmin The final solution result is obtained; and if not, returning to the global solution module of the deep space track design problem.

As an example, the present invention compares the proposed method with other methods. Refer to table 2.

TABLE 2 comparison of RL _HDE (method of the present application) with Friedman results of the other method

The method is used for solving seven famous interplanetary orbit detection tasks, and the performance of the method is verified to be superior to that of other design methods.

The seven interplanetary exploration tasks are respectively as follows: the geosynchronous exploration casini tasks (Cassini 1 and Cassini2 for short), the asteroid TW229 exploration task (Gtoc 1 for short), the 67P/Churyumov-Gerasimenko comet exploration Rosemata task (Rosetta for short), the flying Jupiter exploration task (Sagas for short), and the Mercury intersection exploration Messenger number tasks (Messenger and Messenger-Full for short).

In the Friedman analysis of the design results in table 2, the lower the algorithm score, the better the performance of the corresponding design method. RL _ HDE scored the lowest in the comparative method, 2.7143, indicating that RL _ HDE was superior in optimizing performance to the comparative method over the seven interplanetary track design tasks described above.

The innovation points of the invention are as follows:

(1) RL _ HDE uses Q-Learning algorithm to adaptively control six different variation strategies, and enhances the optimization ability of the algorithm. Meanwhile, aiming at the self-adaptive control of six different mutation strategies, the global operator uses an LSHADE _ EIG method, the method improves an international evolution calculation competition (CEC 2015) algorithm LSHADE _ SPS _ EIG, and an SPS framework is not used any more.

(2) Book (I)The invention uses Q-Learning algorithm to adaptively control the trigger parametersρ _1,max Andρ _2,max and the exploration and development capacity of the method is better balanced.

The invention has the beneficial effects that: the solving speed of the interplanetary detection track optimization design can be effectively improved, and the calculation precision of the detector track is improved.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. A reinforced hybrid differential evolution method for interplanetary exploration orbit design is characterized by comprising the following steps: the method comprises the following steps:

S2, problem of constructionMIs an objective function off(x) And a decision vectorxGlobal search area upper boundaryx ^ub Lower boundary, lower boundaryx ^lb ；

S3, initializing parameters for Q-learning: learning rateαDiscount factorγ；

initializing global operator LSHADE _ EIG highest-standing algebraρ _1,max And current stall algebraρ ₁ ；

Scale factor parameter for initializing interior point methodls_eval；

stagnant algebraρ ₂ The system is used for recording the accumulated stagnation times when the local operator CMA-ES is solved;

s6, adopting a Q-Learning algorithm to adaptively select a mutation strategy;

s8, updating a Q-Table matrix of the adaptive control mutation strategy;

s9, judging

Is established, wherein

will stall algebraρ ₁ Set to zero, willx ^gmin Is replaced by

(ii) a Otherwise, the stagnating algebraρ ₁ Self-adding 1;ρ ₁ returning to the step S5 after updating;

s10, according to

S12, judgingρ ₂ Whether or not less thanρ _2,max And is made ofFESWhether or not less thanMAX_FESIf yes, go to step S13; otherwise, updating adaptive control trigger parametersρ _2,max The Q-Table matrix enters the step S15 to represent the end of the CMA-ES local search solution;

s14, judging

Whether or not the above-mentioned conditions are satisfied,

if so, it will stall algebraρ ₂ Set to zero, willx ^gmin Instead of using

(ii) a Otherwise stagnation algebraρ ₂ Self-adding 1;ρ ₂ returning to the step S12 after updating;

s15, judging the current solving timesFESWhether it is less than 0.75MAX_FESAnd if so, the control unit is used for controlling the operation of the mobile phone,returning to the step S4; if the current number of solution timesFESIs no longer less thanMAX_FESThen, go to step S16; if the current number of solving timesFESWhether it is greater than 0.75MAX_FESAnd is less thanMAX_FESUpdating the global optimal solution by using a local operator inner point method; judgment of

If it is, it willx ^gmin Instead of using

，

Obtaining an optimal solution for an interior point method; finally, updating local operator interior point method parameters, and entering step S16;

s16, judging the current solving timesFESWhether or not it is greater than or equal toMAX_FESIf yes, the solution is finished, and the current situation isx ^gmin The final solution result is obtained; if not, the process returns to step S4.

2. The method for enhanced mixed differential evolution of interplanetary exploration orbit design according to claim 1, characterized in that: parameter(s)ρ _max1, Is Q-Table matrix ofQ _DE ，Q _DE According to a first parameterSc _DE 1 state and first parameterf _DE1 The state of (1) is combined to obtain six population evolution states, and the six population evolution states comprise seven preset first action update values; wherein the population evolution state is used as a matrixQ _DE The first action update value as a matrixQ _DE A column of (1); first parameterSc _DE 1 comprises three states; first parameterf _DE1 Two states are included.

3. The reinforced mixture of claim 2 for interplanetary exploration orbit designA combined differential evolution method, characterized by: updating trigger parametersρ _1,max The concrete formula of (2) is as follows:

wherein,ε ₁ one of the seven first actions is updated.

4. The method for the enhanced mixed differential evolution of the interplanetary exploration orbit design as claimed in claim 1, wherein: the Q-Table matrix of the adaptive control mutation strategy isQ _strategy ，Q _strategy According to the second parameterSc _DE2 And a second parameterf _DE2 Dividing the states, combining to obtain twenty population evolution states and including six variation strategies; wherein the population evolution state is used as a matrixQ _strategy As a matrix, a mutation strategyQ _strategy A column of (1); second parameterSc _DE2 Comprises five states; second parameterf _DE2 Including four states.

5. The method for enhanced mixed differential evolution of interplanetary exploration orbit design according to claim 1, characterized in that: parameter(s)ρ _max2, The Q-Table matrix ofQ _CMA ，Q _CMA According to the parametersSuc _CMA And parameters ofRatio _CMA Dividing the states, combining to obtain eight species of population evolution states, and updating values of five preset second actions; wherein the population evolution state is used as a matrixQ _CMA The second action update value as a matrixQ _CMA The columns of (a); parameter(s)Suc _CMA Comprises two states; parameter(s)Ratio _CMA Including four states.

6. The method for the enhanced mixed differential evolution of the interplanetary exploration orbit design as claimed in claim 5, wherein: updating trigger parametersρ _2,max The concrete formula of (1) is as follows:

wherein,ε ₂ one of the five second actions is updated.

7. The utility model provides a reinforced mixed differential evolution system towards interplanetary exploration track design which characterized in that: the system comprises:

the deep space track design problem construction module comprises:

The deep space track design problem parameter initialization module:

initialize parameters for Q-learning: learning rateαDiscount factorγ；

Scale factor parameter for initializing interior point methodls_eval；

the deep space track design problem global solving module comprises:

updating a Q-Table matrix of the adaptive control mutation strategy;

judgment of

Is established, wherein

if true, the number of generations will be stagnantρ ₁ Set to zero, willx ^gmin Instead of using

the local solving module of the deep space track design problem comprises:

according to

Judgment ofρ ₂ Whether or not less thanρ _2,max And is andFESwhether or not less thanMAX_FESIf yes, starting a local operator CMA-ES, and starting to solve the local search space; otherwise, updating adaptive control trigger parametersρ _max2, The Q-Table matrix enters a deep space track design problem convergence solving module;

judgment of

Whether or not the above-mentioned conditions are satisfied,

the deep space track design problem convergence solving module comprises:

the first solving section: judging the current solving timesFESIs less than 0.75MAX_FESIf so, returning to a global solution module of the deep space track design problem; if the current number of solution timesFESIs no longer less thanMAX_FESThen enter the second solution portion; if the current number of solution timesFESIs greater than 0.75MAX_FESAnd is less thanMAX_FESUpdating the global optimal solution by using a local operator inner point method; judgment of

If it is, it willx ^gmin Instead of using

，

Obtaining an optimal solution for the interior point method; finally, updating local operator interior point method parameters, and entering a second solving part;

the second solving part: judging the current solving timesFESWhether or not it is greater than or equal toMAX_FESIf yes, the solution is finished, and the current situation isx ^gmin Solving the result for the final; if not, returning to the global solution module of the deep space track design problem.