CN107730004B

CN107730004B - Self-strategy iterative absorption GFT intelligent decision model self-beat training method

Info

Publication number: CN107730004B
Application number: CN201710851136.7A
Authority: CN
Inventors: 孙智孝; 费思邈; 管聪; 姚宗信; 杨芳; 朴海音; 杜冲; 葛俊
Original assignee: Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Current assignee: Shenyang Aircraft Design and Research Institute Aviation Industry of China AVIC
Priority date: 2017-09-20
Filing date: 2017-09-20
Publication date: 2021-12-28
Anticipated expiration: 2037-09-20
Also published as: CN107730004A

Abstract

The invention provides a self-strategy iterative absorption GFT intelligent decision model self-pulsation training method, which comprises the following steps: from the Nth generation of self-paced training, the strategy countermeasures of the GFT algorithm model of the first N generations are sequenced; selecting the best N GFT algorithm models, and combining to ensure that the sum of N weights is 1 to form the combined GFT of the Nth generation; selecting a heuristic optimization method, and optimizing n GFT weights with the sum of 1 in strategy countermeasure to obtain n optimized weights; sorting the n weight values, and deleting m GFTs with smaller weight values, wherein m < < n; and selecting m GFTs with the highest capability rank from the GFT algorithm model generated by the (N + k) th generation of self-paced training, and supplementing the GFTs into the combined GFT to form a new combined GFT. The method provided by the invention can realize the spiral increase of the decision-making capability of the algorithm model.

Description

Self-strategy iterative absorption GFT intelligent decision model self-beat training method

Technical Field

The invention belongs to the field of unmanned aerial vehicle intelligent algorithms, and particularly relates to a self-pulsation training method of a GFT intelligent decision model for self-strategy iterative absorption.

Background

The GFT (genetic fuzzy tree) is used as an intelligent decision algorithm with strong practicability, and has been proved to be capable of achieving flight control and tactical decision of unmanned fighters in high-fidelity simulated air combat missions, which shows that the GFT algorithm which is trained well can be used for intelligent decision in certain scenes with strong tacticity.

In the implementation process of the engineering software of the GFT algorithm model, the self-paced training of the GFT algorithm model is a very important link, a plurality of simulated fighting environments are established in the self-paced training process, the fighting game environment is a zero-sum game (the outcome must be success or failure), two AI with the same basic model respectively operated by the GFT algorithms with two different parameters are arranged in each environment for strategy fighting, the GFT decision algorithm model in the winning AI in each environment is selected, and the strategy fighting of the next generation is entered after copying, crossing and variation, so that the capacity of the GFT algorithm model is improved in the aspect of strategy fighting through cyclic self-paced iteration in the process. Obviously, the design of the self-pulsation training method of the algorithm model is very important, and the good self-pulsation training method can enable the GFT algorithm model to have fast strategy absorption capacity in the training process, and greatly improve the decision-making capacity of the GFT algorithm model.

Through self-game iteration of multiple generations, the GFT algorithm model has the problems that several GFT models with the top winning rate or several groups of GFT model parameters (GFT1, GFT2, GFT3, … and GFTn) have the following problems: the GFT algorithm models of all generations beat the GFT of the previous generation, but the winning between generations does not realize the increase of the actual decision making capability of the GFT algorithm models.

As shown in fig. 1, the GFT2 winning in the second generation strategy confrontation defeats the GFT1 winning in the first generation strategy confrontation, the GFT3 winning in the third generation strategy confrontation defeats the GFT2 winning in the second generation strategy confrontation, and the GFT3 winning in the third generation strategy confrontation defeats the GFT1, the AIs trained in each iteration stage defeat each other, and the ability of the AI in strategy confrontation is not substantially improved, such a self-training method may result in the GFT's ability to absorb the strategy in the iteration training being halted, and the goal of spirally raising the strategy ability of the GFT algorithm model through the generation-by-generation training is not achieved.

Disclosure of Invention

The invention aims to provide a self-strategy iterative absorption GFT intelligent decision model self-contraction training method, which overcomes or alleviates at least one of the defects in the prior art.

The purpose of the invention is realized by the following technical scheme: a self-strategy iterative absorption GFT intelligent decision model self-beat training method comprises the following steps:

the method comprises the following steps: from the Nth generation of self-paced training, the strategy countermeasures of the GFT algorithm model of the first N generations are sequenced;

step two: the best n GFT algorithm models are selected and combined:

comGFT＝lamda1GFT1+lamda2GFT2+…+lamdanGFTn

lamda1+ lamda2+ … + lamdan is 1, the sum of N weights is 1, and comGFT is the combined GFT of the Nth generation;

step three: selecting a heuristic optimization method, and optimizing n GFT weights with the sum of 1 in strategy countermeasure to obtain n optimized weights;

step four: sorting the n weight values, and deleting m GFTs with smaller weight values, wherein m < < n;

step five: and selecting m GFTs with the highest capability rank from the GFT algorithm model generated in the (N + k) th generation of the self-paced training, and supplementing the GFTs with the m GFTs to form a new comGFT.

Preferably, in the first step, the GFTs of the versions are sorted according to fitness value.

Preferably, the heuristic optimization method in the third step is a genetic algorithm GA, the GGFT model is optimized and formed, and the model is selected, crossed, mutated and recombined, so that the capacity of the formed GGFT model in the self-paced iteration is improved.

Preferably, the m GFTs with the highest rank in the ability selected in the fifth step are not duplicated with the GFTs selected in the nth generation, and there is a preferential selection of variant genes.

The self-paced training method of the GFT intelligent decision model for self-strategy iterative absorption has the beneficial effects that the problem that the iterative absorption capability of a GFT algorithm model in the self-paced training process is poor is solved. By adopting the GFT self-pulsation training method, the self-strategy absorption capacity of the GFT algorithm model in the iteration process is obviously better than that of a common algorithm self-pulsation training method, and the decision-making capacity of the algorithm model can be increased spirally.

Drawings

FIG. 1 is a schematic diagram of a loop in which a GFT is caused to arrest the absorptive capacity of a strategy in iterative training by a prior art self-paced training method;

FIG. 2 is a schematic diagram of the self-strategy iterative absorption GFT intelligent decision model self-contraction training method for sequencing strategy confrontation capacity of a GFT algorithm model;

FIG. 3 is a flowchart of optimizing GFT weights using genetic algorithm GA according to one embodiment of the present invention.

Detailed Description

In order to make the implementation objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be described in more detail below with reference to the accompanying drawings in the embodiments of the present invention. In the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The described embodiments are only some, but not all embodiments of the invention. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The self-paced training method of the GFT intelligent decision model for self-strategy iterative absorption of the invention is further explained in detail by combining the embodiment.

A self-strategy iterative absorption GFT intelligent decision model self-beat training method comprises the following implementation steps.

The method comprises the following steps: from the 5 th generation of self-paced training, the strategy confrontation capacity of the GFT algorithm model of the first 5 th generation is ranked, and GFT of each version is ranked according to fitness value, which is shown in figure 2.

Step two: the best 20 GFT algorithm models (i.e. the top 20 GFTs in the power sequence) are selected and combined:

comGFT＝lamda1GFT1+lamda2GFT2+…+lamdanGFT20

lamda1+ lamda2+ … + lamda20 is 1, and the sum of 20 weights is 1, thus forming the combined GFT of generation 5, i.e., comGFT.

By means of linear combination of better GFT strategy models, absorption of strategies learned by the GFT of previous generations can be achieved.

Step three: and (3) selecting a heuristic optimization method, and optimizing 20 GFT weights with the sum of 1 in the strategy countermeasure. The heuristic optimization method selects a genetic algorithm GA, optimizes and forms a GGFT model, selects, crosses, mutates and recombines the model to form the capacity improvement of the GGFT model in the self-pulsation iteration, and the detailed improvement process is as follows: 1) establishing chromosomes for n weights of n GFTs in the GGFT of the generation; 2) initializing population scale and maximum evolution algebra; 3) setting a weight parameter range; 4) generating a species P (t); 5) inputting GGFT in the group into strategy counterenvironment, and training a GGFT model; 6) calculating fitness of each chromosome; 7) calculating the individual cumulative probability; 8) through replication, crossover and mutation evolution, the optimal chromosome is reserved; 9) generating a population P (t + 1); judging whether the convergence condition is met, if not, returning to 4), if yes, carrying out the next step; 10) decoding chromosomes, and extracting optimal GGFT parameters; 11) deleting m sub GFT models with smaller weight values, and introducing m GFT models with the former strategy capacity of new variation in self-pulsation training; 12) new GGFTs are formed, see figure 3 for details. And further obtain 20 optimized weights of the embodiment.

Step four: the 20 weights are sorted, and the 4 GFTs with the smaller weights are deleted.

Step five: and selecting 4 GFTs with the highest capability rank from the GFT algorithm model generated in the 6 th generation of self-paced training, and supplementing the GFTs into the comGFT to form a new comGFT, wherein the 4 GFTs are not repeated with the GFT selected in the 5 th generation, and mutation genes are preferentially selected, so that the iterative updating of the comGFT is completed.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A self-strategy iterative absorption GFT intelligent decision model self-beat training method is used for flight control and tactical decision of an unmanned fighter and is characterized by comprising the following steps:

the method comprises the following steps: establishing a plurality of simulated confrontation environments, wherein the confrontation game environments are zero sum games, each simulated confrontation environment is provided with two unmanned fighter models which are operated by GFT algorithms with different parameters for strategy confrontation, selecting a GFT algorithm model corresponding to a winning unmanned fighter model from each simulated confrontation environment, copying, crossing and mutating the GFT algorithm model, entering next generation of strategy confrontation, and sequencing the strategy confrontation capacity of the GFT algorithm models of the previous N generations from the Nth generation;

step two: the best n GFT algorithm models are selected and combined:

comGFT＝lamda1GFT1+lamda2GFT2+…+lamdanGFTn

step five: and selecting m GFTs with the highest capability rank from the GFT algorithm models generated in the (N + k) th generation of the self-paced training, and supplementing the m GFTs into the comGFTs to form a new comGFT serving as the algorithm model for operating the unmanned fighter model.

2. The self-strategy iterative absorption GFT intelligent decision model self-paced training method according to claim 1, wherein in the first step, GFTs of respective versions are sorted by fitness value.

3. The self-strategy iterative absorption GFT intelligent decision model self-paced training method of claim 1, wherein the heuristic optimization method in the third step is a genetic algorithm GA, a GGFT model is formed through optimization, and the model is selected, crossed, mutated and recombined, so that the capacity of the formed GGFT model in self-paced iteration is improved.

4. The self-strategy iterative absorbing GFT intelligent decision model self-paced training method of claim 1, wherein the top m GFTs selected in the fifth step are not duplicated with the GFT selected in the Nth generation and there is a preferential selection of variant genes.