CN114548409A

CN114548409A - Unmanned vehicle task allocation game method and device based on state potential field

Info

Publication number: CN114548409A
Application number: CN202210116279.4A
Authority: CN
Inventors: 韩泽宇; 王建强; 刘艺璁; 许庆
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-01-30
Filing date: 2022-01-30
Publication date: 2022-05-27
Anticipated expiration: 2042-01-30
Also published as: CN114548409B

Abstract

The application discloses unmanned vehicle task allocation game method and device based on a potential field, wherein the method comprises the following steps: calculating the threat degree and the importance degree of each unit of the two security parties based on the current environment situation; respectively establishing two party strategy game matrixes under at least one target according to the strategy set of each unit of the two parties of the security protection; and solving the strategy game matrixes of the two parties until the decision requirements are met, calculating a plurality of strategy suggestions distributed by the multi-stage task, and displaying the solving results of the strategy game matrixes of the two parties while acquiring the possible response of the opposite party under each strategy suggestion to determine the optimal strategy suggestion. Therefore, the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security policy, the function of a decision maker cannot be fully reflected, time is consumed for tasks, resource consumption is high and the like are solved.

Description

Unmanned vehicle task allocation game method and device based on state potential field

Technical Field

The application relates to the technical field of security task decision, in particular to an unmanned vehicle task allocation game method and device based on a potential field.

Background

In the field of security, the unmanned vehicles can independently complete given tasks, so that the labor cost is reduced, multiple unmanned vehicles can cooperate to more effectively complete tasks such as area searching, enclosure pursuit and the like, and the safety of users is guaranteed. Therefore, how to allocate the B-party unit corresponding to the execution task for the a-party unmanned vehicle has become a research focus in this field.

In the related art, there are mainly the following two methods:

the method is based on 0-1 integer programming, calculates the predicted success rate of each selectable strategy, and directly selects the strategy with the highest success rate. The method has the disadvantages that the antagonism of the security problem cannot be embodied, a more appropriate strategy of the A party cannot be given for the strategy of the B party, and the response of the B party cannot be predicted;

and on the other hand, based on the game theory, a revenue matrix corresponding to the strategies of the two parties is established, and the strategy of the party A which can show better under any strategy of the party B is obtained by solving the Nash equilibrium of the game problem. The main problem with this type of approach is the slow solving speed.

However, the following problems mainly exist in the related art:

1) the antagonism of the security problem cannot be reflected, and the response to the strategy of the other party cannot be made.

2) The role of the decision maker cannot be fully embodied, and the following two points are included:

a) in the solving process, the solving can not be stopped at any time according to the requirements of decision makers, and the balance between the strategy optimality and the solving time can not be realized;

b) only one strategy can be solved, and the decision maker cannot be provided with enough strategy selection.

3) Most researches are only limited to improving the success rate of tasks, and joint analysis on multiple targets such as task time consumption and resource consumption is less;

4) the existing research considering the time consumption of tasks is only to simply calculate the time required by the A-square unit to go to the B-square unit by using the straight-line distance between the A-square unit and the B-square unit. For unmanned vehicles, this time-consuming method of computing tasks is not reasonable enough.

Therefore, in view of the disadvantages of the related art, there is a need for further improvement of the unmanned vehicle allocation method.

Content of application

The application provides an unmanned vehicle task allocation game method and device based on a potential field, and aims to solve the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security strategy, the function of a decision maker cannot be fully reflected, the task is time-consuming, the resource consumption is high, and the like.

The embodiment of the first aspect of the application provides a task allocation gaming method for an unmanned vehicle based on a potential field, which comprises the following steps: calculating the threat degree and the importance degree of each unit of the two security parties based on the current environment situation; respectively establishing two party strategy game matrixes under at least one target according to the strategy set of each unit of the two parties of the security protection; and solving the two-party strategy game matrix until the decision requirement is met, calculating a plurality of strategy suggestions distributed by the multi-stage task, and displaying the solving result of the two-party strategy game matrix while acquiring the possible response of the opposite party under each strategy suggestion to determine the optimal strategy suggestion.

Optionally, in an embodiment of the present application, the respectively establishing two-party policy gaming matrices under at least one target includes: listing the strategy sets of each unit of the security protection two parties; calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of the two parties in the strategy set; and generating a two-party strategy game matrix under the one or more targets.

Optionally, in an embodiment of the present application, the solving the two-party policy gaming matrix includes: preprocessing the strategy game matrixes of the two parties; and solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.

Optionally, in an embodiment of the present application, the preprocessing the two-party policy gaming matrix includes: carrying out normalization processing on each game matrix, and combining the game matrices according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems; based on a plurality of groups of double-matrix game problems, converting a double-matrix game model into a quadratic programming model according to the relation between the game and the mathematical programming to obtain the preprocessed game problem solved by the differential evolution algorithm.

Optionally, in an embodiment of the present application, the solving the two-party policy gaming matrix includes: and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.

Optionally, in an embodiment of the present application, the obtaining a possible response of an opposite party under each policy suggestion and simultaneously displaying a solution result of the policy game matrix of the two parties to determine an optimal policy suggestion includes: determining response with optimal effect under a first-stage strategy according to a preset double-matrix game principle; the strategy suggestions, the specific contents of the strategies, the strategy success rate and the time consumption, the strategies at the second stage and the responses which are solved under one or more decision requirements are displayed on a visual interface in a table form; determining the optimal strategy suggestion according to a selection instruction generated by a decision maker based on the information on the visual interface.

An embodiment of a second aspect of the present application provides an unmanned vehicle task allocation gaming device based on a potential field, including: the computing module is used for computing the threat degree and the importance degree of each unit of the security protection two parties based on the current environment situation; the integration module is used for respectively establishing two-party strategy game matrixes under at least one target according to the strategy set of each unit of the two parties of the security protection; and the decision module is used for solving the two-party strategy game matrix until the decision requirement is met, calculating a plurality of strategy suggestions distributed by the multi-stage task, acquiring the possible response of the opposite party under each strategy suggestion, displaying the solving result of the two-party strategy game matrix and determining the optimal strategy suggestion.

Optionally, in an embodiment of the present application, the integrating module includes: the enumeration unit is used for enumerating the policy sets of the units of the security protection parties; the first calculating unit is used for calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of the two strategies in the strategy set; and the generating unit is used for generating a two-party strategy game matrix under the one or more targets.

Optionally, in an embodiment of the present application, the decision module includes: the preprocessing unit is used for preprocessing the two-party strategy game matrix; and the second computing unit is used for solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.

Optionally, in an embodiment of the present application, the preprocessing unit includes: the merging unit is used for carrying out normalization processing on each game matrix and merging each game matrix according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems; and the conversion unit is used for converting the double-matrix game model into a quadratic programming model according to the relation between the game and the mathematical programming based on a plurality of groups of double-matrix game problems to obtain the preprocessed game problems solved by the differential evolution algorithm.

Optionally, in an embodiment of the present application, the decision module is further configured to: and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.

Optionally, in an embodiment of the present application, the decision module further includes: the response unit is used for determining the response with the optimal effect under the first-stage strategy according to a preset double-matrix game principle; the visualization unit is used for displaying the strategy suggestions, the specific contents of the strategies, the success rate and the time consumption of the strategies, the strategies at the second stage and the responses which are solved under one or more decision requirements on a visualization interface in a tabular form; and the decision unit is used for determining the optimal strategy suggestion according to a selection instruction generated by a decision maker based on the information on the visual interface.

An embodiment of a third aspect of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the unmanned vehicle task allocation gaming method based on the potential field.

A fourth aspect embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, the program being executed by a processor for implementing the potential field based unmanned vehicle mission allocation gaming method according to any of claims 1-6.

The embodiment of the application establishes the potential field based on the overall situation analysis of the environment, and then calculates the strategy success rate, so that one party of security can make the best response to the strategy of the other party, and provides various strategy selections for decision makers, the action of the decision makers is fully reflected, the whole time consumption is balanced, the resource consumption can be saved, and the practicability is good. Therefore, the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security policy, the function of a decision maker cannot be fully reflected, time is consumed for tasks, resource consumption is high and the like are solved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a method for unmanned vehicle task allocation gaming based on a potential field according to an embodiment of the present application;

fig. 2 is a flowchart of a method for an unmanned vehicle mission allocation gaming based on a potential field according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a method for unmanned vehicle mission allocation gaming based on a potential field according to an embodiment of the present application;

FIG. 4 is a diagram of a neural network architecture for calculating threat and importance from a situation field, according to an embodiment of the present application;

FIG. 5 is a flow chart of RRT for post-pruning according to an embodiment of the present application;

fig. 6 is a diagram of an RRT planned path in a setting scenario according to an embodiment of the present application;

fig. 7 is a partial view of a game matrix of party a corresponding to a strategy success rate in a setting scenario according to an embodiment of the present application;

FIG. 8 is a flow chart of a differential evolution algorithm provided in accordance with an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating a differential evolution optimization progress according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an unmanned vehicle task allocation gaming device based on a potential field according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

The method and the device for unmanned vehicle task allocation gaming based on the potential field according to the embodiment of the application are described below with reference to the accompanying drawings. Aiming at the problems that the analysis effect of the overall situation of the related technology mentioned in the background technology center is poor, reasonable response cannot be made to the security strategy of the other party, the effect of a decision maker cannot be fully reflected, the task consumes time and the resource consumption is high, the application provides the unmanned vehicle task allocation game method based on the situation field. Therefore, the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security policy, the function of a decision maker cannot be fully reflected, time is consumed for tasks, resource consumption is high and the like are solved.

Specifically, fig. 1 is a schematic flow chart of an unmanned vehicle task allocation gaming method based on a potential field according to an embodiment of the present application.

As shown in fig. 1, the unmanned vehicle task allocation gaming method based on the potential field includes the following steps:

in step S101, the threat level and the importance level of each unit of both security guards are calculated based on the current environmental situation.

Specifically, in the embodiment of the present application, based on the environmental potential field, the threat level and the importance level of each unit of the two security parties can be calculated, and the logical expression thereof is as follows:

{{T}，{V}}＝f(S)，

wherein S is an environment state potential field, and S belongs to R²The form of the method is a matrix obtained after map rasterization, and the numerical value of each point of the matrix is the situation field force of the corresponding scene coordinate point. { T } is a threat degree set of each unit of both security sides, { V } is an importance degree set of each unit of both security sides, and f is a mapping relation of threat degree and importance degree obtained according to situation fields.

In particular implementation, mapping from the state potential field to the degree of threat and importance may employ a rule-based approach or a learning-based approach, etc. If the rule-based approach is based, the following expression may be referenced:

wherein, t_jThreat degree in units of j, E_jIn units of the situation field force at j, E_iIs the state field force at the other unit i of the unit j, and n is the total number of units of the other unit of the unit j. V_jIs the degree of importance of the unit j, C_jFor the purpose of being based on the potential field, the relative similarity of the unit j and the optimal unit is calculated by applying an Analytic Hierarchy Process (AHP) and a distance to solution (TOPSIS).

If the threat degree and the importance degree are calculated by adopting a learning-based method, the situation field matrix can be input into a deep learning neural network, and the threat degree and the importance degree of the two units can be calculated through operations such as convolution, pooling and the like, wherein the structure of the threat degree and the importance degree is shown in fig. 4.

For example, in the embodiment of the present application, a rule-based method may be adopted to set a scene in which both have three units, calculate the threat degree and importance degree of both units from the state potential field, and obtain both unit information as shown in table 1, where table 1 is both unit information in the set scene.

TABLE 1

The embodiment of the application can be when formulating unmanned aerial vehicle security strategy, from whole situation analysis, can be according to both sides units' threat degree and importance degree, and then form the best strategy, can effectively guarantee the rationality and the reliability of unmanned vehicle task allocation game, satisfy the user demand, guarantee to use and experience.

In step S102, a policy game matrix of both parties under at least one target is respectively established according to a policy set of each unit of both parties of the security protection.

In the actual execution process, the embodiment of the application can enumerate the strategy sets of the two parties according to the two security parties and establish the strategy game matrix of the two parties under at least one target. At least one of the objectives may be success rate, time consumption and resource consumption, and its specific algorithm will be described in detail below. According to the embodiment of the application, the strategy sets are respectively enumerated according to the security protection parties, and the strategy game matrix is established, so that the accuracy and the applicability of the strategy result provided by the strategy game matrix are ensured, and a decision maker can perform optimal selection.

Optionally, in an embodiment of the present application, respectively establishing a two-party policy gaming matrix under at least one target includes: listing a strategy set of each unit of both security parties; calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of two parties in the strategy set; and generating a two-party strategy game matrix under one or more targets.

As one way to implement this, when the policy sets of both parties are listed in the embodiment of the present application, the unit number of the a-party is set to M, the unit number of the B-party is set to N, and it is assumed that one unit can only be allocated to one unit. There are (+1) optional tasks per unit on party a, i.e., assigning to any unit on party B, or not assigning a task). Therefore, the strategy available for the A party is (N +1)^MIt is expressed as:

wherein an arbitrary strategy α_uFor an M-element array, each element in the array is defined as:

policy definitions for party B are similar, sharing (M +1)^NIt is recorded as:

wherein, in the setting scene, both sides have 3⁴81 alternative strategies.

In the embodiment of the present application, a method for calculating one or more targets of success rate, expected time consumption, and expected resource consumption of each policy of two parties in a policy set is as follows:

further, when calculating the success rate of each policy of both parties, in the embodiment of the present application, when both parties have determined the task allocation policy, the success rate of both parties calculated by the policies of both parties is defined as follows:

P_b＝-P_a，

wherein, P_aSuccess rate of A square，P_bIn order to achieve the success rate of the B party,

is the ith unit threat degree of the A side,

is the j th unit threat degree of the B side,

for the ith unit importance of party a,

is the jth unit importance of the B-side, x_ijIs defined as:

y_ijand the indexes are corresponding indexes in the strategy of the B party.

When the predicted time consumption of each strategy of the two parties is calculated, the time consumption of the task is composed of two parts: static time, i.e. the time required for task execution; dynamic time, i.e. the time required for each unit to go to the corresponding unit.

In the static time part, according to the priori knowledge, the static time can be set as a fixed value t in the embodiment of the application₀。

In the dynamic time section, a situation risk threshold value δ may be set in the embodiment of the present application, and a region in which the risk in the situation field is higher than the threshold value and an obstacle region detected by a sensor are combined to be defined as an impassable region. For the passable area, a fast-expanding random tree (RRT) algorithm of post-pruning is adopted for path planning, and an algorithm flow chart and specific steps are shown in fig. 5:

first, initialize the path tree and get the starting point q_initAdded to the tree and randomly sampled in space at a point q_rand(ii) a Judging q_randIf the current position is not in the passable area, returning to re-sampling, and if the current position is in the passable area, continuing to sample; finding the distance q in the current tree_randNearest point q_near(ii) a In fixed steps fromq_nearTo q_randExpanding one step to obtain a new node q_new(ii) a Judging the expansion edge q_nearq_newWhether the node will pass through the impassable area or not, if not, the new node q is connected_newAdding to the tree, and if so, discarding the node; computing a new node q_newTo the planned end point q (if any)_goalIf the distance is smaller than the set value, the planning is considered to be finished, otherwise, the step S502 is returned to continue the circulation; from the distance end point q_goalThe nearest tree nodes start reverse backtracking in sequence until the starting point q_initSo as to obtain a passable path connecting the starting point and the end point; finally, post pruning is introduced, namely whether a straight line formed by connecting two points at a certain distance in the passable path point sequence passes through the impassable area or not is judged, if not, the path point between the two points can be deleted, and the two points are directly connected to ensure that the path is smoother.

After the steps are completed, the passable path between any two units can be obtained, the linear distances between adjacent path points are sequentially calculated, the total distance of the path is obtained through summation, and the total distance is divided by the speed of the corresponding unit, namely the dynamic time required by movement. Wherein, the time required for the A-side unit i to go to the B-side unit j in the strategy is recorded as t_ijThe time required by each policy of party a can be expressed as:

t_sum＝max{t_ij}+t₀；i＝1，2，…，M；j＝1，2，…，N.。

the time consumption of the B-party strategy can be calculated according to similar ideas.

For example, in the setting scenario of the embodiment of the present application, t may be set₀The situation risk threshold δ is 7 at 100s, and the path planned using the RRT algorithm of post-pruning is shown in fig. 6: in the figure, the horizontal and vertical axes are actual position coordinates in the environment, the rectangle is an area where the obstacle and the situation risk are higher than the threshold value, the lower circle is three units of the a side, the upper circle is three units of the B side, and the curve is a path drawn by the RRT from each unit of the a side to each unit of the B side. It can be seen that although the path planned by the algorithm has broken lines, the path may not meet the vehicle kinematic constraint, but the length of the path is close to that of the vehicle kinematic constraintIn practical situations, the method has higher reference value and is more reasonable for estimating the moving time.

When calculating the predicted resource consumption of each policy of both parties, the embodiment of the present application may give a predicted resource consumption calculation formula in consideration of the number of units and their respective importance levels put into each policy as follows:

wherein, each variable is defined as: r_aFor resource consumption of party A, R_eIn order to achieve the resource consumption of the B-party,

for the ith unit importance of party a,

is the jth unit importance of the B-side, x_ijIs defined as:

y_ijand the indexes are corresponding indexes in the strategy of the B party.

By applying the above steps in the setting scenario, two 81 × 81 dimensional matrices can be obtained under three targets, and the strategy success rate matrix of the a-side is taken as an example, and its part is shown in fig. 7.

According to the embodiment of the application, the accuracy and the applicability of the provided strategy result are ensured by performing multi-target joint analysis, and a decision maker can perform optimal selection.

In step S103, the two-party policy game matrix is solved until the decision requirement is satisfied, a plurality of policy suggestions for multi-stage task allocation are calculated, and the solving result of the two-party policy game matrix is displayed while the possible response of the opposite party under each policy suggestion is obtained, so as to determine the optimal policy suggestion.

In the actual execution process, the strategy game matrixes of the two parties can be solved until the decision requirements are met, and a plurality of strategy suggestions for multi-stage task distribution are calculated. Meanwhile, after the possible response of the opposite party under each strategy suggestion is obtained, the solving result of the strategy game matrixes of the two parties can be displayed, and the best strategy suggestion is further determined.

Optionally, in an embodiment of the present application, solving the two-party policy game matrix includes: preprocessing the strategy game matrixes of the two parties; and solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.

For example, the method for solving the two-party policy game matrix in the embodiment of the present application includes: firstly, preprocessing a strategy game matrix of two parties in the embodiment of the application; secondly, the embodiment of the application can utilize a differential evolution algorithm to solve the preprocessed two-party strategy game matrix.

Optionally, in an embodiment of the present application, the preprocessing the two-party policy gaming matrix includes: carrying out normalization processing on each game matrix, and combining each game matrix according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems; based on a plurality of groups of double-matrix game problems, converting a double-matrix game model into a secondary planning model according to the relation between the game and the mathematical planning, and obtaining a preprocessed game problem solved by a differential evolution algorithm.

It can be understood that, when the embodiment of the application is used for preprocessing, each game matrix can be subjected to normalization processing, each game matrix is combined according to different weights according to target requirements, and then a plurality of groups of double-matrix game problems are obtained, and based on the plurality of groups of double-matrix game problems, the double-matrix game model is converted into a quadratic programming model according to the relation between games and mathematical programming, so that the preprocessed game problem solved by a differential evolution algorithm is obtained.

Firstly, the embodiments of the present application can use a range variation method to respectively normalize the game matrix under each target (success rate, task time consumption, resource consumption)Change the value a of the ith row and the jth column in each matrix_ijAfter normalization, the method comprises the following steps:

wherein, a_minIs the minimum value in the matrix, a_maxIs the maximum value in the matrix. Each value in the matrix after mapping is between 0 and 1, so that magnitudes of values in different target matrices are consistent, and subsequent weighting and combining are facilitated.

Secondly, the game matrixes can be combined according to different weights according to requirements, and a plurality of groups of independent double-matrix games are formed. According to the embodiment of the application, three requirements of high success rate, short consumed time and a balance strategy can be set, and the weights of all targets under different requirements can be set as:

the success rate is higher: merging matrix is 0.9 × success rate +0.05 × time consumption +0.05 × resource;

the time consumption is short: merging matrix is 0.7 × success rate +0.25 × time consumption +0.05 × resource;

and (3) balancing strategy: the merge matrix is 0.8 × success rate +0.15 × time consumption +0.05 × resources.

Since the success rate is always the first element, the embodiment of the present application may set the success rate weight to the maximum. In addition, matrix modeling is not required to be repeated for different requirements, and only solution is needed after respective summation.

Further, according to the embodiment of the application, the double-matrix game model can be converted into a quadratic programming model according to the relation between the game and the mathematical programming. The embodiment of the application can set the number of the strategies of the two parties to be M and n respectively, and the game matrixes of the two parties to be M (a)_ij(,N(b_ij(then, the quadratic programming problem obtained after transformation is expressed as:

wherein x is (x)₁,x₂,…,x_m),y＝(y₁,y₂,…,y_n) Respectively the probability of two parties selecting each strategy in the mixed strategy solution, v₁,v₂Are two variables introduced. The objective function value of the quadratic programming problem is always no greater than zero, and if and only if the game problem obtains an optimal solution, the objective function value is zero.

Finally, the embodiment of the application can solve the preprocessed game problem by using a differential evolution algorithm, wherein the differential evolution algorithm is one of genetic algorithms, and is characterized by high convergence rate, simple implementation and suitability for real number coding instead of binary coding. In the embodiment of the present application, the population of the differential evolution is a set of mixed strategy solutions of the game problem, where each independent variable is the probability of choosing a certain strategy. The algorithm flow chart and the specific steps are shown in FIG. 8:

step S801: and (5) initializing a population. A set of solutions for the model that needs to be optimized is randomly generated.

Step S802: and calculating the fitness. And substituting the solutions into the objective function to judge the effect of each solution.

Step S803: and (6) judging to be terminated. Namely, whether an optimal solution is obtained or whether a specified iteration number is reached is judged, if yes, the optimal solution under the existing condition is given, and if not, circulation is continued.

Step S804: and (4) selecting. And selecting a part of solutions according to population fitness to be reserved.

Step S805: and (4) crossing. And exchanging the reserved solutions pairwise according to a certain probability.

Step S806: and (5) carrying out mutation. Each group of solutions is mutated through a differential strategy, and the formula is as follows:

v_i(g+1)＝x_best(g)+F·(x_r1(g)-x_r2(g))，

wherein v is_i(g +1) is a new set of solutions obtained after mutation, x_best(r) is the existing optimal solution, x_r1(r),x_r2(g) F is a variation factor, which is the original two solutions at random.

Finally, the steps S802 to S806 are looped until the termination condition is satisfied.

The embodiment of the application plans the path between the two units by using the RRT, and calculates the time required by going to the unit according to the path, so that the result is more reasonable, and the consumption of time and resources is less.

Optionally, in an embodiment of the present application, solving the two-party policy game matrix includes: and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.

As a possible implementation manner, the embodiment of the present application may display the change of the objective function value along with the iteration number in a form of a line graph in real time, and give a current optimization progress for reference of a decision maker. If the decision maker considers that the current solving progress meets the requirement, a 'stopping solving' instruction can be issued at any time, and the device gives the optimal solution under the current progress. The progress in the optimization process under the set scenario is shown in fig. 9. According to the embodiment of the application, the action of a decision maker can be fully embodied, the solution can be stopped at any time according to the requirement of the decision maker in the solution process, and the balance between the strategy optimality and the solution time can be achieved.

Optionally, in an embodiment of the present application, the method for determining the best policy suggestion by showing the solution result of the policy game matrix of both parties while obtaining the possible response of the other party under each policy suggestion includes: determining response with optimal effect under a first-stage strategy according to a preset double-matrix game principle; the strategy suggestions, the specific contents of the strategies, the success rate and the time consumption of the strategies, the strategies at the second stage and the responses which are solved under one or more decision requirements are displayed on a visual interface in a tabular form; and determining the optimal strategy suggestion according to a selection instruction generated by a decision maker based on the information on the visual interface.

In the actual execution process, after the differential evolution solution is completed, a multi-stage task allocation mechanism can be introduced, and second-stage allocation is performed on the two residual units after the first-stage task allocation execution is completedSo that the method is more suitable for the requirements of practical application. Firstly, setting an effect threshold value of an A-party strategy

And calculating the actual effect gamma of the A-party strategy, wherein the calculation formula is as follows:

wherein the content of the first and second substances,

is the ith unit threat degree of the A side,

is the jth unit importance of the B-side, x_ijIs defined as follows:

the embodiment of the application can be used for comparing the actual effect gamma with the effect threshold value

Contrast, if

The strategy of the first stage is considered to have no sufficient effect on the B party, so that the planning of the second stage is needed, and a round of task distribution is developed.

Before the second stage task allocation calculation is carried out, the damage degree of each unit of the two parties needs to be calculated according to the strategies of the two parties in the first stage, and whether the two parties can participate in the second stage task is judged. Here, the damage degree threshold per unit is set as

Actual damage degree eta of jth unit_jThe calculation formula is as follows:

wherein the variables are defined as in the calculation formula.

If it is

The unit is considered to be damaged too much after the task of the first stage is completed, and the task of the second stage cannot be executed any more, so that the unit is excluded.

Next, in the embodiment of the present application, the policy set of the second stage of the a-party may be listed by using the related method in the above steps according to the filtered list of the remaining units of the second stage of the two parties. In order to ensure the real-time performance as much as possible and prevent the high uncertainty of long-term prediction, the task allocation in the second stage only considers the strategy of the party A, does not use a game model any more, and does not consider the possible response of the party B.

Meanwhile, due to the uncertainty of the positions of the two parties after the first-stage task is completed, the time and resource consumption of the strategy are not considered any more in the second-stage task allocation, and only the success rate is taken as the only target, and after the strategy set of the A party is listed, the strategy with the highest success rate is directly selected to be used as the strategy suggestion of the second stage. The design can also accelerate the solving speed.

Further, after the strategy of the party A is listed, possible responses of the party B under each strategy can be given, and the solving results under different strategy targets are displayed in a list form for decision makers to select.

Specifically, according to the double-matrix game principle, the method and the device can give the response of the B party with the optimal effect under the first-stage strategy of the A party. If the strategy serial number selected by the party A is i₀The response strategy sequence number j of the B party₀The following relationship is satisfied in the B-party gambling matrix N:

j₀＝arg max_j N[i₀,j]，

and information such as strategy suggestions, strategy specific contents, strategy success rate and time consumption, second-stage strategies, predicted B party responses and the like solved by the A party under the three requirements of high success rate, short time consumption and balanced strategies is displayed on a visual interface of the device in a tabular form for reference and selection of decision makers. The policy suggestion table in the setting scenario is shown in table 2, where table 2 is the policy suggestion table.

TABLE 2

In the above table, in the two columns of "a-party allocation proposal" and "first-stage expected B-party response", the units before the arrow are allocated to the units after the arrow in the policy. If a unit in the policy points to 0, it is stated that the unit in the policy does not allocate the task, and it is able to prevent the units executing the task from being too many, resulting in too much resource consumption.

A specific embodiment of the present application will be described in detail with reference to fig. 2 to 9.

As shown in fig. 2, a specific embodiment of the present application includes the following steps:

s201: based on the environment state potential field, the threat degree and the importance degree of each unit of both the security protection parties are calculated. The logic expression is as follows:

{{T}，{V}}＝f(S)，

In particular implementation, the mapping from the state potential field to the threat degree and the importance degree can adopt a rule-based method or a learning-based method and the like. If the rule-based approach is based, the following expression may be referenced:

wherein, t_jThreat degree in units of j, E_jIn units of the situation field force at j, E_iIs the state field force at the other unit i of the unit j, and n is the total number of units of the other unit of the unit j. V_jIs the degree of importance of the unit j, C_jFor the purpose of being based on the state potential field, the relative similarity of the unit j and the optimal unit is calculated by using Analytic Hierarchy Process (AHP) and distance to solution (TOPSIS).

In the embodiment of the application, a rule-based method can be adopted, a scene with three units for both parties is set, and the threat degree and the importance degree of the units for both parties are calculated according to the state potential field to obtain the unit information of both parties.

S202: enumerating the strategy sets of the two parties, and establishing a strategy game matrix of the two parties under the three targets of success rate, time consumption and resource consumption. The unit number of the A side is set as M, the unit number of the B side is set as N, and one unit can be only distributed to one unit. There are (+1) optional tasks per unit on party a, i.e., assigning to any unit on party B, or not assigning a task). Therefore, the strategy available for the A party is (N +1)^MIt is expressed as:

of party BPolicy definitions are similar, in common (M +1)^NIt is recorded as:

wherein, in the setting scene, both sides have 3⁴81 alternative strategies.

When the success rate of each policy of both parties is calculated in the embodiment of the present application, when both parties have determined the task allocation policy, the success rate of both parties calculated by the policies of both parties is defined as follows:

P_b＝-P_a，

wherein, P_aFor A-party success rate, P_bIn order to achieve the success rate of the B party,

is the ith unit threat degree of the A side,

is the j th unit threat degree of the B side,

for the ith unit importance of party a,

is the jth unit importance of B_ijIs defined as:

y_ijand the indexes are corresponding indexes in the strategy of the B party.

first, initialize the path tree and get the starting point q_initAdded to the tree and randomly sampled in space at a point q_rand(ii) a Judging q_randIf the current position is not in the passable area, returning to re-sampling, and if the current position is in the passable area, continuing to sample; finding the distance q in the current tree_randNearest point q_near(ii) a With fixed step size from q_nearTo q_randExpanding one step to obtain a new node q_new(ii) a Judging the expansion edge q_nearq_newWhether the node will pass through the impassable area or not, if not, the new node q is connected_newAdding to the tree, and if so, discarding the node; computing a new node q_newTo the planned end point q (if any)_goalIf the distance is smaller than the set value, the planning is considered to be finished, otherwise, the step S502 is returned to continue the circulation; from the distance end point q_goalThe nearest tree nodes start reverse backtracking in sequence until the starting point q_initSo as to obtain a passable path connecting the starting point and the end point; finally, post pruning is introduced, namely whether a straight line formed by connecting two points at a certain distance in the passable path point sequence passes through the impassable area or not is judged, if not, the path point between the two points can be deleted, and the two points are directly connected to ensure that the path is smoother.

After the steps are completed, the passable path between any two units can be obtained, the linear distances between adjacent path points are sequentially calculated, the total distance of the path is obtained through summation, and the total distance is divided by the speed of the corresponding unit, namely the dynamic time required by movement. Wherein, the A side in the strategy is put in a unitThe time required for the i to go to the B unit j is denoted t_ijThe time required by each policy of party a can be expressed as:

t_sum＝max{t_ij}+t₀；i＝1，2，…，M；j＝1，2，…，N.。

In the setting scenario, let t₀The situation risk threshold δ is 7 at 100s, and the path planned using the RRT algorithm of post-pruning is shown in fig. 6: in the figure, the horizontal and vertical axes are actual position coordinates in the environment, the rectangle is an area where the obstacle and the situation risk are higher than the threshold value, the lower circle is three units of the a side, the upper circle is three units of the B side, and the curve is a path drawn by the RRT from each unit of the a side to each unit of the B side. It can be seen that although the path planned by the algorithm has a broken line, the vehicle kinematic constraint may not be satisfied, but the length of the path is close to the actual condition, and the path has a higher reference value and is more reasonable to be used for estimating the moving time.

for the ith unit importance of party a,

is the jth unit importance of the B-side, x_ijIs defined as:

y_ijand the indexes are corresponding indexes in the strategy of the B party.

S203: and (4) preprocessing the game matrix such as simplification and conversion, and optimally solving by adopting a differential evolution algorithm. Firstly, the game matrixes under each target (success rate, task time consumption and resource consumption) can be respectively normalized by using a range variation method in the embodiment of the application, and the numerical value a of the ith row and the jth column in each matrix_ijAfter normalization, the method comprises the following steps:

Further, according to the embodiment of the application, the double-matrix game model can be converted into a quadratic programming model according to the relation between the game and the mathematical programming. The embodiment of the application can set the number of the strategies of the two parties to be M and n respectively, and the game matrixes of the two parties to be M (a)_ij),N(b_ij) Then the quadratic programming problem obtained after transformation is expressed as:

step S801: and (4) initializing a population. A set of solutions for the model that needs to be optimized is randomly generated.

v_i(g+1)＝x_best(g)+F·(x_r1(g)-x_r2(g))，

And looping the steps S802 to S806 until the termination condition is met.

S204: and displaying the solving progress in real time, stopping solving at any time according to the requirements of a decision maker, and calculating the strategy suggestion of multi-stage task allocation. The embodiment of the application can display the change of the objective function value along with the iteration times in real time in the form of a line graph, and provides the current optimization progress for a decision maker to refer to. If the decision maker considers that the current solving progress meets the requirement, a 'stopping solving' instruction can be issued at any time, and the device gives the optimal solution under the current progress. The progress in the optimization process under the set scenario is shown in fig. 9.

S205: and giving out the possible response of the B party under each suggested strategy, and displaying the solution results under the different strategy targets in a list form for a decision maker to select. After the differential evolution solution is completed, a multi-stage task allocation mechanism can be introduced, and second-stage allocation is performed on the two residual units after the first task allocation is completed, so that the requirements of practical application are met. Firstly, setting an effect threshold value of an A-party strategy

wherein the content of the first and second substances,

is the ith unit threat degree of the A side,

is the jth unit importance of the B-side, x_ijIs defined as:

In contrast, if

Before the second stage task allocation calculation is carried out, the damage degree of each unit of the two parties needs to be calculated according to the strategies of the two parties in the first stage, and whether the two parties can participate in the second stage task or not needs to be judged. Here, the damage degree threshold per unit is set as

Actual damage degree eta of jth unit_jThe calculation formula is as follows:

wherein the variables are defined as in the calculation formula.

If it is

j₀＝arg max_j N[i₀,j],

and information such as strategy suggestions, strategy specific contents, strategy success rate and time consumption, second-stage strategies, predicted B-party responses and the like which are solved by the A-party under the three requirements of high success rate, short time consumption and balanced strategies is displayed on a visual interface of the device in a tabular form for reference and selection of decision makers.

According to the unmanned vehicle task allocation game method based on the potential field, the potential field is established based on the overall situation analysis of the environment, the strategy success rate is further calculated, so that one party of security can make the best response to the strategy of the other party, various strategy selections are provided for decision makers, the functions of the decision makers are fully reflected, the overall time consumption is balanced, the resource loss can be saved, and the practicability is good. Therefore, the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security policy, the function of a decision maker cannot be fully reflected, time is consumed for tasks, resource consumption is high and the like are solved.

The unmanned vehicle task allocation gaming device based on the potential field is described next with reference to the attached drawings.

Fig. 10 is a block schematic diagram of an unmanned vehicle mission allocation gaming device based on a state potential field according to an embodiment of the present application.

As shown in fig. 10, the potential field based unmanned vehicle task dispensing gaming device 10 includes: a calculation module 100, an integration module 200 and a decision module 300.

Specifically, the calculating module 100 is configured to calculate the threat degree and the importance degree of each unit of the two security protection parties based on the current environment situation.

And the integrating module 200 is configured to respectively establish a policy game matrix of both parties under at least one target according to a policy set of each unit of both parties of the security protection.

The decision module 300 is configured to solve the two-party policy game matrix until a decision requirement is met, calculate a plurality of policy suggestions for multi-stage task allocation, obtain a possible response of the opposite party under each policy suggestion, display a solution result of the two-party policy game matrix, and determine an optimal policy suggestion.

Optionally, in an embodiment of the present application, the integrating module 200 includes: the device comprises an enumeration unit, a first calculation unit and a generation unit.

The enumeration unit is used for enumerating policy sets of each unit of both security protection parties.

The first calculating unit is used for calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of the two strategies in the strategy set.

And the generating unit is used for generating a two-party strategy game matrix under one or more targets.

Optionally, in an embodiment of the present application, the decision module 300 includes: a preprocessing unit and a second computing unit.

The preprocessing unit is used for preprocessing the strategy game matrixes of the two parties.

And the second computing unit is used for solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.

Optionally, in an embodiment of the present application, the preprocessing unit includes: the subunit and the transformant unit were pooled.

The merging subunit is used for carrying out normalization processing on each game matrix and merging each game matrix according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems.

And the converter unit is used for converting the double-matrix game model into a quadratic programming model according to the relation between the game and the mathematical programming based on a plurality of groups of double-matrix game problems to obtain the preprocessed game problem solved by the differential evolution algorithm.

Optionally, in an embodiment of the present application, the decision module 300 is further configured to: and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.

Optionally, in an embodiment of the present application, the decision module 300 further includes: the system comprises a response unit, a visualization unit and a decision unit.

The response unit is used for determining the response with the optimal effect under the first-stage strategy according to a preset double-matrix game principle.

And the visualization unit is used for displaying the strategy suggestions, the strategy specific contents, the strategy success rate and time consumption, the second-stage strategies and the responses which are solved under one or more decision requirements on a visualization interface in a tabular form.

And the decision unit is used for determining the optimal strategy suggestion according to a selection instruction generated by the decision maker based on the information on the visual interface.

It should be noted that the foregoing explanation on the state field-based unmanned vehicle task allocation gaming method embodiment is also applicable to the state field-based unmanned vehicle task allocation gaming device of this embodiment, and details are not repeated here.

According to the unmanned vehicle task allocation gaming device based on the potential field, the potential field is established based on the overall situation analysis of the environment, the strategy success rate is calculated, so that one party of security can make the best response to the strategy of the other party, various strategy selections are provided for decision makers, the functions of the decision makers are fully reflected, the overall time consumption balance is achieved, the resource consumption can be saved, and the practicability is good. Therefore, the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security policy, the function of a decision maker cannot be fully reflected, time is consumed for tasks, resource consumption is high and the like are solved.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

a memory 1101, a processor 1102, and a computer program stored on the memory 1101 and executable on the processor 1102.

The processor 1102, when executing the program, implements the potential field based unmanned vehicle task allocation gaming method provided in the embodiments above.

Further, the electronic device further includes:

a communication interface 1103 for communicating between the memory 1101 and the processor 1102.

A memory 1101 for storing computer programs that are executable on the processor 1102.

The memory 1101 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 1101, the processor 1102 and the communication interface 1103 are implemented independently, the communication interface 1103, the memory 1101 and the processor 1102 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.

Optionally, in a specific implementation, if the memory 1101, the processor 1102 and the communication interface 1103 are integrated on one chip, the memory 1101, the processor 1102 and the communication interface 1103 may complete communication with each other through an internal interface.

The processor 1102 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.

The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the potential field-based unmanned vehicle mission allocation gaming method as described above.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. An unmanned vehicle task allocation gaming method based on a potential field is characterized by comprising the following steps:

calculating the threat degree and the importance degree of each unit of the two security parties based on the current environment situation;

respectively establishing two party strategy game matrixes under at least one target according to the strategy set of each unit of the two parties of the security protection; and

and solving the two-party strategy game matrix until the decision requirement is met, calculating a plurality of strategy suggestions distributed by the multi-stage task, and displaying the solving result of the two-party strategy game matrix while acquiring the possible response of the opposite party under each strategy suggestion to determine the optimal strategy suggestion.

2. The method of claim 1, wherein the respectively establishing a two-party policy gaming matrix under at least one target comprises:

listing the strategy sets of each unit of the security protection two parties;

calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of the two parties in the strategy set;

and generating a two-party strategy game matrix under the one or more targets.

3. The method of claim 1, wherein said solving said two-party policy gambling matrix comprises:

preprocessing the strategy game matrixes of the two parties;

and solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.

4. The method of claim 3, wherein the pre-processing the two-party policy gaming matrix comprises:

carrying out normalization processing on each game matrix, and combining the game matrices according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems;

based on a plurality of groups of double-matrix game problems, converting a double-matrix game model into a quadratic programming model according to the relation between the game and the mathematical programming to obtain the preprocessed game problem solved by the differential evolution algorithm.

5. The method of claim 1, wherein said solving said two-party policy gambling matrix comprises:

and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.

6. The method according to claim 4 or 5, wherein the step of displaying the solution result of the policy game matrix of the two parties while obtaining the possible response of the other party under each policy suggestion and determining the best policy suggestion comprises the steps of:

determining response with optimal effect under a first-stage strategy according to a preset double-matrix game principle;

the strategy suggestions, the specific contents of the strategies, the strategy success rate and the time consumption, the strategies at the second stage and the responses which are solved under one or more decision requirements are displayed on a visual interface in a table form;

and determining the optimal strategy suggestion according to a selection instruction generated by a decision maker based on the information on the visual interface.

7. An unmanned vehicle task allocation gaming device based on a potential field, comprising:

the computing module is used for computing the threat degree and the importance degree of each unit of the security protection two parties based on the current environment situation;

the integration module is used for respectively establishing two-party strategy game matrixes under at least one target according to the strategy set of each unit of the two parties of the security protection; and

and the decision module is used for solving the two-party strategy game matrix until the decision requirement is met, calculating a plurality of strategy suggestions distributed by the multi-stage task, acquiring the possible response of the opposite party under each strategy suggestion, displaying the solving result of the two-party strategy game matrix and determining the optimal strategy suggestion.

8. The apparatus of claim 7, wherein the integrating module comprises:

the enumeration unit is used for enumerating the policy sets of the units of the security protection parties;

the first calculating unit is used for calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of the two strategies in the strategy set;

and the generating unit is used for generating a two-party strategy game matrix under the one or more targets.

9. The apparatus of claim 7, wherein the decision module comprises:

the preprocessing unit is used for preprocessing the two-party strategy game matrix;

10. The apparatus of claim 9, wherein the pre-processing unit comprises:

the merging unit is used for carrying out normalization processing on each game matrix and merging each game matrix according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems;

and the conversion unit is used for converting the double-matrix game model into a quadratic programming model according to the relation between the game and the mathematical programming based on a plurality of groups of double-matrix game problems to obtain the preprocessed game problems solved by the differential evolution algorithm.

11. The apparatus of claim 7, wherein the decision module is further configured to: and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.

12. The apparatus of claim 10 or 11, wherein the decision module further comprises:

the response unit is used for determining the response with the optimal effect under the first-stage strategy according to a preset double-matrix game principle;

the visualization unit is used for displaying the strategy suggestions, the specific contents of the strategies, the success rate and the time consumption of the strategies, the strategies at the second stage and the responses which are solved under one or more decision requirements on a visualization interface in a tabular form;

and the decision unit is used for determining the optimal strategy suggestion according to a selection instruction generated by a decision maker based on the information on the visual interface.

13. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of any one of claims 1-6 for a dynamic field based unmanned vehicle mission allocation gaming.

14. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executable by a processor for implementing a method for a dynamic potential field based unmanned vehicle mission allocation gaming according to any of claims 1-6.