CN112306070A

CN112306070A - Multi-AUV dynamic maneuver decision method based on interval information game

Info

Publication number: CN112306070A
Application number: CN202011150930.7A
Authority: CN
Inventors: 刘禄; 张立川; 白春梅; 张硕; 任染臻
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2020-10-24
Filing date: 2020-10-24
Publication date: 2021-02-02

Abstract

The invention relates to a multi-AUV dynamic maneuver decision method based on an interval information game. Subsequently, a payment matrix is executed, which consists of interval information and payment interval levels combining the four parameter interval sets and relative entropy. Then, Nash equilibrium conditions meeting the interval game conditions are provided, and a Nash equilibrium maneuvering decision model under the dynamic marine environment is established. Meanwhile, an improved differential evolution algorithm is applied to solve the existing problems and find an optimal strategy. The invention solves the problem of the influence of weak connectivity, uncertainty and variability of the underwater environment on the modeling difficulty, and ensures that the established model is more persuasive and more reliable in the application of the actual water area.

Description

Multi-AUV dynamic maneuver decision method based on interval information game

Technical Field

The invention belongs to the field of multi-underwater robot cooperative confrontation, and particularly relates to a multi-AUV cooperative confrontation method based on an interval information game.

Background

With the development of science and technology, Autonomous Underwater Vehicles (AUV) have been widely used in the relevant fields of marine observation, marine rescue, mine area search, enemy reconnaissance and the like. The high efficiency and reliability of the multi-AUV system due to the space-time distribution and the redundant configuration of the multi-AUV system provide a new solution for complex ocean tasks. The multi-AUV game cooperation can be used for ocean research and military countermeasure, including underwater multi-target tracking, monitoring and detection, and can effectively enlarge the underwater battle radius and reduce underwater equipment and casualties.

The maneuver decision is the key of the multi-UUV cooperative countermeasure, and is the basic action of each countermeasure step. There is also much research on unilateral strategy optimization, but little research on bilateral game theory. Therefore, by introducing the cooperative game theory into the maneuvering decision of the unmanned aerial vehicle system cluster, a more scientific and more accurate real-time countermeasure strategy can be made.

Disclosure of Invention

Technical problem to be solved

The existing method ignores the complexity and uncertainty of the underwater environment, and cannot obtain accurate real-time underwater environment characteristics, so that the reliability of the established decision model is low, the adopted conventional decision algorithm is easy to fall into local optimization, and finally, an accurate and reliable decision scheme is difficult to obtain, and the method cannot be applied to a real sea area environment. The invention aims to provide a multi-AUV dynamic maneuvering decision algorithm for a section information game aiming at the defects of the existing method.

Technical scheme

A multi-AUV dynamic maneuver decision method based on an interval information game is characterized by comprising the following steps:

step 1: obtaining an advantage function of the multi-AUV system of the two countermeasures according to the situation advantage and the energy efficiency advantage:

the situation advantages comprise an angle advantage A_agSpeed advantage A_sAnd distance advantage A_dis；

Wherein | AA | is twoThe viewing angle of an AUV player, ATA is the target incident angle;

wherein v is_n1i，v_n2jIs the velocity vector of both parties in the game; n1 and n2 in the subscripts are both countermeasures, i and j are the ith and jth AUV corresponding to both countermeasures;

wherein D_ijIs the distance between different AUVs; r₀＝(R_max+R_min)/2；R_maxIs the maximum starting distance, R_minIs the minimum starting distance;

the overall situation dominance function is: w_A＝k₁A_ag+k₂A_s+k₃A_disWherein k is₁，k₂，k₃Is a weighting coefficient, k₁+k₂+k₃＝1；

The energy efficiency advantage function is written as:

wherein C is_n1i，C_n2jIs to combat the energy efficiency of both AUV systems;

the overall merit function of the multi-AUV system of our party is:

wherein delta₁，δ₂Is a weighting coefficient and satisfies δ₁+δ₂＝1；

Representing a merit function having upper and lower bounds;

obtaining the overall advantage function of the enemy in the game in the same way

Step 2: obtaining a payment matrix of the multi-AUV system according to the interval information and payment interval grades combining the four parameter interval sets and the relative entropy:

the payment function of the multi-AUV game under uncertain information obtained according to the advantage function in the step 1 is established as follows:

wherein x_ij，y_jiIs a binary decision variable, x _ij1 denotes the ith AUV of my party attacking the jth AUV of the enemy, and x _ij0 means that the ith AUV of my party does not attack the jth AUV of the enemy; likewise, y_jiWhether the jth AUV representing the enemy attacks the ith AUV of the enemy or not;

the payment matrix under uncertain information is therefore:

the method combining the four parameter interval sets and the relative entropy is improved to ensure that

W_mnIs a function of the advantage of the normalization,

wherein x₁,x₂,…,x_mIs the maneuver strategy of my AUV System, y₁,y₂,…,y_nIs a maneuvering strategy of an enemy multi-AUV system,

represents payment when the m-th policy is used by my AUV system and the n-th policy is used by the enemy AUV system;

and step 3: solving the Nash equilibrium optimal solution, and finding the optimal strategy:

the confrontation track of the multiple AUVs is regarded as the combination of each action, k-level dynamic games are used in the confrontation process, and each level of games comprises 7 actions, namely, keeping the original flight, accelerating, decelerating, turning left, turning right, climbing and diving;

considering the practical subsea environment constraints, solving the nash equilibrium problem can be converted into an optimization problem with interval uncertainty parameters:

wherein x_iIndicating the probability of my AUV adopting the ith policy,

is the threshold value of the benefit of the participant,

the payment of my AUV system using the ith strategy and the payment of the enemy AUV system using the jth strategy;

by the optimal parameter x in the optimal solution_iAnd the probability of adopting the optimal strategy at the moment is obtained, and the AUV executes the action of the optimal strategy pair.

In the step 3, an improved differential evolution algorithm is adopted to solve the Nash equilibrium optimal solution, and an optimal strategy is found; the improved differential evolution algorithm comprises the steps of mutation, intersection and selection, and in order to select the optimal fitness, a game algorithm is combined, and when a new mutation vector is generated, a scaling ratio F is determined according to the evolution time and the difference between the best individual and the worst individual:

wherein Δ G ═ G/G, G is the maximum number of iterations, and G is the current number of iterations; f. of_bestIs the best fitness, f_worstIs the worst fitness, f_iIs the current personal fitness; f_maxAnd F_minAre the maximum and minimum values of F.

Advantageous effects

The multi-AUV dynamic maneuver decision method based on the interval information game has the following beneficial effects that:

(1) the influence of weak connectivity, uncertainty and variability of an underwater environment on the modeling difficulty is solved, the established model is more convincing, and the model is more reliable in application of an actual water area. The interval information can represent underwater environment characteristics including various uncertainties, and the established model is more persuasive.

(2) The problem that the decision algorithm falls into the local optimal situation is solved, the optimal solution is searched in the whole algorithm, and the obtained result is more accurate and credible. The method selects the improved differential evolution algorithm when solving the problem, and effectively avoids the condition that the algorithm is trapped in local optimization.

Drawings

FIG. 1: level k gaming for multiple AUV systems

FIG. 2: operating procedure of IDE Algorithm

FIG. 3: expected revenue

FIG. 4: multi-AUV collaborative dynamic maneuver decision: first stage

FIG. 5: multi-AUV collaborative dynamic maneuver decision: second stage

FIG. 6: multi-AUV collaborative dynamic maneuver decision: the third stage

FIG. 7: multi-AUV collaborative dynamic maneuver decision: fourth stage

FIG. 8: multi-AUV collaborative dynamic maneuver decision: the fifth stage

Detailed Description

The invention will now be further described with reference to the following examples and drawings:

the technical scheme of the scheme is as follows: firstly, on the basis of a multi-AUV maneuvering strategy, an advantage function consisting of situation advantages and energy efficiency advantages is provided. Subsequently, a payment matrix is executed, which consists of interval information and payment interval levels combining the four parameter interval sets and relative entropy. Then, Nash equilibrium conditions meeting the interval game conditions are provided, and a Nash equilibrium maneuvering decision model under the dynamic marine environment is established. Meanwhile, an improved differential evolution algorithm is applied to solve the existing problems and find an optimal strategy. Finally, the superiority of the proposed multi-AUV dynamic maneuver decision algorithm is verified by examples. The method comprises the following steps:

in order to establish an interval information payment matrix, a plurality of AUV maneuvering attribute evaluation methods are provided according to the situation information of the two enemy parties, as shown in the k-level dynamic game in fig. 1. The confrontational trajectories of multiple AUVs are treated as a combination of each action. Multiple AUVs for the enemy and multiple AUV systems for my party are considered decision-making parties in the game.

The gaming model of the multiple AUV system based on uncertain information can be expressed as:

wherein N ═ { N ═ N₁,N₂The decision parties in the game are determined;

is the policy space of the decision maker,

meaning that we choose the strategy of the ith category,

indicating that the enemy selects the jth strategy in the kth stage;

is the revenue interval corresponding to each policy that the multiple AUV systems participating in the game may select. According to the game tree shown in fig. 1, the actions of the multi-AUV system in the phase k game can be represented by one information set, so that the manipulation policy is actually a set of action rules of the multi-AUV system in each information.

The main difference between the multi-AUV counter-action and the other autonomous robot counter-actions is the information transfer mode. Due to the influence of the marine environment, information in the multi-AUV countermeasure process is mainly received by underwater sound waves. The shallow sea acoustic channel is a channel with space-time frequency variations. It has strong multipath interference, high environmental noise, large transmission loss and serious Doppler shift effect. Therefore, there is a great uncertainty in the information provided during the multi-AUV countermeasure. It is difficult to accurately quantify the threat level of both parties in the decision making process. Therefore, in the present invention, each attribute is represented by section information set in the decision process. The merit function that can evaluate the payment of each AUV consists of two parts, a situation advantage and an energy efficiency advantage.

In order to attack an enemy multi-AUV system, it is necessary to occupy a favorable attack position and minimize the attack risk of our multi-AUV system.

(1) The situation advantages include an angular advantage A_agSpeed advantage A_sAnd distance advantage A_dis；

Where | AA | is the perspective of two players and ATA is the target angle of incidence;

wherein v is_n1i，v_n2jIs the velocity vector of both parties in the game; n1 and n2 in the subscripts are both antagonistic parties, and i and j are the ith and jth AUV corresponding to both antagonistic parties.

the overall situation dominance function is: w_A＝k₁A_ag+k₂A_s+k₃A_disWherein k is₁，k₂，k₃Is a weighting coefficient, k₁+k₂+k₃＝1

(2) The energy efficiency merit function may be written as:

wherein C is_n1i，C_n2jIs energy efficient against both AUV systems.

(3) The overall merit function of the multi-AUV system of our party is:

Representing a merit function with upper and lower bounds.

(4) By exchanging the situation information parameters of the two parties, the overall advantage function W E of the enemy in the game can be obtained₂。

Step 2: and obtaining a payment matrix of the multi-AUV system according to the interval information and the payment interval grade combining the four parameter interval sets and the relative entropy.

The revenue matrix of the multi-AUV system is executed. The payment matrix consists of interval information and payment interval levels combining four parameter interval sets and relative entropy. Payouts in a game refer to the ultimate profit or loss of the player in the strategic selection. In multi-AUV confrontation, the gain of our AUV must be a loss of the enemy AUV. Thus, the game of the present invention falls within the category of two-player zero-sum games.

Due to the various underwater interference factors, the multi-AUV system cannot accurately obtain various information in the actual submarine maneuver decision. After a reasonable analysis of the collision situation, each interference factor usually varies within a certain interval. Thus, a revenue matrix for each multi-AUV system is established based on the interval information.

wherein x_ij，y_jiIs a binary decision variable, x_ij1 denotes our ith AUV attacking enemy jth AUV, and x_ij0 means that our ith AUV does not attack the jth AUV of the enemy; likewise, y_jiWhether the jth AUV representing the enemy attacks our ith AUV.

For comparison of interval information sets, the sizes cannot be compared from a quantitative point of view like real numbers. Ranking methods based on degree of likelihood may fail, while ranking methods based on geometric distance may result in significant information loss. To avoid these drawbacks, a method is proposed that combines four sets of parameter intervals and relative entropy.

Payment interval

Is derived from the combined information of the two parties to the game. But the payment does not take into account the distribution of points in the interval. In practice, the set of internal payment intervals cannot simply be considered as a uniform distribution. It should change according to the change of underwater confrontation. For a policy x_iWhen the adversary situation is favorable for the attacker, the benefit of the attacker inevitably tends to f^R(ii) a And vice versa. If not, it will tend to f^L. To make it sufficientAnd converting the payment interval into four set parameter intervals by using the advantage matrix information.

The payment matrix under uncertain information is therefore:

W_mnIs a function of the advantage of the normalization,

indicating payment when the my AUV system uses the mth policy and the enemy AUV system employs the nth policy.

The basic idea of the ranking method is to use the information entropy to measure the difference between the AUV self income and the maximum income (minimum income) under different strategies, and select the strategy with the minimum difference between the AUV self income and the maximum income (or the strategy with the maximum difference between the AUV self income and the minimum income). In fact, the highest reward indicates that the AUV has completed the intended task and there is no casualty. The lowest profit represents the situation where the AUV has not completed the intended task and casualties are greatest.

And step 3: and applying the improved differential evolution algorithm to solve the Nash equilibrium optimal solution and finding the optimal strategy.

The confrontation track of the multi-AUV is regarded as the combination of each action, k-level dynamic games are used in the confrontation process, and each level of games basically comprises 7 actions, namely, keeping the original flight, accelerating, decelerating, turning left, turning right, climbing and diving.

wherein x_iIndicating the probability of my AUV adopting the ith policy,

is the threshold value of the benefit of the participant,

my AUV system uses the ith policy and the enemy AUV system uses the jth policy for payment.

A Differential Evolution (DE) is an intelligent optimization method based on population difference heuristics. The DE leverages the differences between population individuals to interfere with individual evolution and searches the entire optimization space using a greedy rule to find an optimal solution. It updates the population by variation, crossover and selection in the population and then finds the best solution. DE has easy operation, good robustness, optimization ability characteristics such as strong. However, when the DE algorithm is applied in the optimization process, a situation may occur in which convergence is slow and falls into local optimum, so that it is difficult to satisfy the requirement of real-time countermeasure.

Aiming at the problems of the DE algorithm, the invention provides an improved differential evolution algorithm (IDE). The algorithm flow is shown in fig. 2:

(1) fitness function

Generally, the best strategy for participant N1 is to maximize its revenue under constraints, while the other participant N2 is the opposite. Therefore, the fitness function here may be an optimization objective function expressed in an optimization formula.

(2) Variation of

The scaling factor F is used to scale each basis vector and generate a new variation vector. A larger F may search for a potentially best solution over a larger range. Conversely, a smaller F may increase convergence speed and improve accuracy. Meanwhile, when the fitness of each person is good, it is preferable that F is small in order to reduce the interference with better persons. Conversely, when the adaptability of each person is relatively poor, it is preferable to expand the search range of the solution, and thus a larger F can be applied. In conjunction with the game algorithm presented herein, the scaling F is determined from the evolution time and the difference between the best and worst individuals:

wherein Δ G ═ G/G, G is the maximum number of iterations, and G is the current number of iterations; f. of_bestIs the best fitness, f_worstIs the worst fitness, f_iIs the current personal fitness; f_maxAnd F_minAre the maximum and minimum values of F. If the adaptability difference between the current individual and the optimal individual is large, it means that the individual is far from the spatially optimal individual. F_iThe larger the value of (A), the larger the interference to the individual, which means that the search range of the algorithm is enlarged and the global search capability is enhanced. If the difference in fitness is small, F_iSmaller values can be taken and the interference to the individual is also smaller, which means that the search is only performed in a smaller area near the individual to enhance the ability of the algorithm to develop. Furthermore, in the later stages of evolution, the value of Δ g is preferably relatively small, so that searches can be made in local areas near the current individual and the accuracy of the algorithm is ensured.

Using the DE current best strategy, the following variation vectors can be derived:

w_i,g＝v_i,g+F_i(v_best,g-v_i,g)+F_i(v_r1,g-v_r2,g)

wherein w_i,gIs a variation vector; f_iIs the scale factor of the current individual as determined by the last equation; v. of_i,gRepresents the current individual vector, and v_best,gRepresents the best individual of the population; r1 and r2 are two different integers and 0<r1，r2<NP, NP is the population.

(3) Crossing

The crossover rate CR determines the crossover probability of variant and original individuals on each dimensional vector. Individuals with greater compliance may have greater CR that accelerates changes in individual structure. Therefore, it is better to use smaller CR in the later stage of evolution to reduce the interference of the target individual to the experimental individual and ensure the convergence speed of the algorithm. The designed crossover rates are as follows:

wherein

Is the current average fitness; CR_iIs the current crossover rate, CR_maxAnd CR_minAre the maximum and minimum values of CR. When the target individual v_i,gWhen the fitness of the target individual is less than the average fitness, the target individual is relatively superior. Should choose a smaller CR_iAnd from the target vector v_i,gMore test vector information is obtained. Otherwise, from the change vector w_i,gObtaining a test vector u_i,gThis improves the diversity of the population. Δ g may ensure that a larger CR is obtained early in evolution_iIncreasing population diversity and speeding convergence. In addition, in later stages of development, smaller CR_iIs favorable for finding the optimal solution.

The interleaving operation can be expressed as:

wherein u is_ij,gIs a test vector u_ij,gThe jth component of (a); rnbr is a random integer less than integer D; rand [0,1 ]]Is a random number between 0 and 1.

(4) Selecting

The selection operation is to select a better fitness between the newly generated test vector and the original target vector to be a member of the next population generation. This is a "greedy" selection operation. The selection operation may be described as follows:

wherein v is_i,g+1Is the next generation of individuals.

Obtaining an optimal solution by utilizing improved differential evolution, and obtaining an optimal parameter x in the optimal solution_iAnd the probability of adopting the optimal strategy at the moment is obtained, and the AUV executes the action of the optimal strategy pair.

Example (c): assume that "a" and "D" participate in a 2-to-2 AUV underwater confrontation. The initial positions of 'a1', 'a2' are (0m, 100m, 200m), (0m, -100m, 200m), 'D1', 'D2' are (800m, 100m, 200m), (800m, -100m, 200 m). The speeds, deflection angles and pitch angles of A1 and A2 are 23m/s, -60 degrees, 5 degrees and 23m/s, 60 degrees and-5 degrees respectively; the velocities, yaw angles and pitch angles of "D1" and "D2" were 25m/s, 120 °, 3 ° and 25m/s, respectively, and-120 ° and-3 °. Both have the same control capability, and the time interval of the opposite step is 5 s. It is clear that "D" has advantages from the outset. It should also be noted that the maximum maneuver step should be determined based on the effectiveness of the AUV used in the confrontation. For comparison of the challenge performance, "a" uses the collaborative dynamic maneuver decision algorithm proposed by the present invention, and "D" uses the max-min decision algorithm in the multiple AUV challenge process. The three-dimensional challenge process with 5 main stages is shown in fig. 4-7. "+" shows the initial position and "4" shows the current position. The confrontation is ended when the expected profit of one party reaches absolute advantage.

The calculation part in the invention is as follows:

k₁＝0.445，k₂＝0.222，k₃＝0.333，δ₁＝0.9，δ₂＝0.1，

ω₁＝ω₂＝ω₃＝ω₄＝0.25 G＝300，NP＝100。

as shown in fig. 3: there are 50 steps in the confrontation process, representing its expected revenue en route. From the last part, the expected revenue obtained indicates that nash equilibrium for the section information game is satisfied.

As shown in fig. 4: "A" dominates, where "A1" attempts to attack "D2" and "A2" steps toward "D1".

As shown in fig. 5: "A1" and "A2" attempt to attack "D1", while "D2" attempts to surpass "A2". Then, the situation changes, where "D" is dominant in phase 3. This can also be verified in fig. 3, where the expected revenue changes from positive to negative.

As shown in fig. 6: "D1" and "D2" will still try to attack "A2", but "A2" will continue to catch up with "D1", and "A1" will change back to the side "D2".

As shown in fig. 7: the situation changes again, with "a" dominating and the expected revenue changing from negative to positive. "a 2" continues to rotate and successfully drives "D1" away, then "a 1" and "a 2" try to attack "D2", but "D1" and "D2" escape in two different directions.

As shown in fig. 8: finally, "a 1" and "a 2" both occupy the dominant position, so "a" gains absolute advantage and ends the confrontation.

Claims

1. A multi-AUV dynamic maneuver decision method based on an interval information game is characterized by comprising the following steps:

the situation advantages include angleAdvantage A_agSpeed advantage A_sAnd distance advantage A_dis；

Wherein | AA | is the viewing angle of two AUV players, and ATA is the target incident angle;

The energy efficiency advantage function is written as:

wherein C is_n1i，C_n2jIs to combat the energy efficiency of both AUV systems;

the overall merit function of the multi-AUV system of our party is:

Representing a merit function having upper and lower bounds;

wherein x_ij，y_jiIs a binary decision variable, x_ij1 denotes the ith AUV of my party attacking the jth AUV of the enemy, and x_ij0 means that the ith AUV of my party does not attack the jth AUV of the enemy; likewise, y_jiWhether the jth AUV representing the enemy attacks the ith AUV of the enemy or not;

the payment matrix under uncertain information is therefore:

W_mnIs a function of the advantage of the normalization,

wherein x_iIndicating the probability of my AUV adopting the ith policy,

is the threshold value of the benefit of the participant,

by the optimal parameter x in the optimal solution_iThat is to say, the current miningAnd taking the probability of the optimal strategy, and executing the action used by the optimal strategy pair by the AUV.

2. The multi-AUV dynamic maneuver decision method based on the interval information game as claimed in claim 1, wherein in step 3, an improved differential evolution algorithm is adopted to solve a Nash equilibrium optimal solution, and an optimal strategy is found; the improved differential evolution algorithm comprises the steps of mutation, intersection and selection, and in order to select the optimal fitness, a game algorithm is combined, and when a new mutation vector is generated, a scaling ratio F is determined according to the evolution time and the difference between the best individual and the worst individual: