CN114548409A - Unmanned vehicle task allocation game method and device based on state potential field - Google Patents

Unmanned vehicle task allocation game method and device based on state potential field Download PDF

Info

Publication number
CN114548409A
CN114548409A CN202210116279.4A CN202210116279A CN114548409A CN 114548409 A CN114548409 A CN 114548409A CN 202210116279 A CN202210116279 A CN 202210116279A CN 114548409 A CN114548409 A CN 114548409A
Authority
CN
China
Prior art keywords
strategy
game
party
matrix
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210116279.4A
Other languages
Chinese (zh)
Other versions
CN114548409B (en
Inventor
韩泽宇
王建强
刘艺璁
许庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210116279.4A priority Critical patent/CN114548409B/en
Publication of CN114548409A publication Critical patent/CN114548409A/en
Application granted granted Critical
Publication of CN114548409B publication Critical patent/CN114548409B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/042Backward inferencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application discloses unmanned vehicle task allocation game method and device based on a potential field, wherein the method comprises the following steps: calculating the threat degree and the importance degree of each unit of the two security parties based on the current environment situation; respectively establishing two party strategy game matrixes under at least one target according to the strategy set of each unit of the two parties of the security protection; and solving the strategy game matrixes of the two parties until the decision requirements are met, calculating a plurality of strategy suggestions distributed by the multi-stage task, and displaying the solving results of the strategy game matrixes of the two parties while acquiring the possible response of the opposite party under each strategy suggestion to determine the optimal strategy suggestion. Therefore, the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security policy, the function of a decision maker cannot be fully reflected, time is consumed for tasks, resource consumption is high and the like are solved.

Description

Unmanned vehicle task allocation game method and device based on state potential field
Technical Field
The application relates to the technical field of security task decision, in particular to an unmanned vehicle task allocation game method and device based on a potential field.
Background
In the field of security, the unmanned vehicles can independently complete given tasks, so that the labor cost is reduced, multiple unmanned vehicles can cooperate to more effectively complete tasks such as area searching, enclosure pursuit and the like, and the safety of users is guaranteed. Therefore, how to allocate the B-party unit corresponding to the execution task for the a-party unmanned vehicle has become a research focus in this field.
In the related art, there are mainly the following two methods:
the method is based on 0-1 integer programming, calculates the predicted success rate of each selectable strategy, and directly selects the strategy with the highest success rate. The method has the disadvantages that the antagonism of the security problem cannot be embodied, a more appropriate strategy of the A party cannot be given for the strategy of the B party, and the response of the B party cannot be predicted;
and on the other hand, based on the game theory, a revenue matrix corresponding to the strategies of the two parties is established, and the strategy of the party A which can show better under any strategy of the party B is obtained by solving the Nash equilibrium of the game problem. The main problem with this type of approach is the slow solving speed.
However, the following problems mainly exist in the related art:
1) the antagonism of the security problem cannot be reflected, and the response to the strategy of the other party cannot be made.
2) The role of the decision maker cannot be fully embodied, and the following two points are included:
a) in the solving process, the solving can not be stopped at any time according to the requirements of decision makers, and the balance between the strategy optimality and the solving time can not be realized;
b) only one strategy can be solved, and the decision maker cannot be provided with enough strategy selection.
3) Most researches are only limited to improving the success rate of tasks, and joint analysis on multiple targets such as task time consumption and resource consumption is less;
4) the existing research considering the time consumption of tasks is only to simply calculate the time required by the A-square unit to go to the B-square unit by using the straight-line distance between the A-square unit and the B-square unit. For unmanned vehicles, this time-consuming method of computing tasks is not reasonable enough.
Therefore, in view of the disadvantages of the related art, there is a need for further improvement of the unmanned vehicle allocation method.
Content of application
The application provides an unmanned vehicle task allocation game method and device based on a potential field, and aims to solve the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security strategy, the function of a decision maker cannot be fully reflected, the task is time-consuming, the resource consumption is high, and the like.
The embodiment of the first aspect of the application provides a task allocation gaming method for an unmanned vehicle based on a potential field, which comprises the following steps: calculating the threat degree and the importance degree of each unit of the two security parties based on the current environment situation; respectively establishing two party strategy game matrixes under at least one target according to the strategy set of each unit of the two parties of the security protection; and solving the two-party strategy game matrix until the decision requirement is met, calculating a plurality of strategy suggestions distributed by the multi-stage task, and displaying the solving result of the two-party strategy game matrix while acquiring the possible response of the opposite party under each strategy suggestion to determine the optimal strategy suggestion.
Optionally, in an embodiment of the present application, the respectively establishing two-party policy gaming matrices under at least one target includes: listing the strategy sets of each unit of the security protection two parties; calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of the two parties in the strategy set; and generating a two-party strategy game matrix under the one or more targets.
Optionally, in an embodiment of the present application, the solving the two-party policy gaming matrix includes: preprocessing the strategy game matrixes of the two parties; and solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.
Optionally, in an embodiment of the present application, the preprocessing the two-party policy gaming matrix includes: carrying out normalization processing on each game matrix, and combining the game matrices according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems; based on a plurality of groups of double-matrix game problems, converting a double-matrix game model into a quadratic programming model according to the relation between the game and the mathematical programming to obtain the preprocessed game problem solved by the differential evolution algorithm.
Optionally, in an embodiment of the present application, the solving the two-party policy gaming matrix includes: and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.
Optionally, in an embodiment of the present application, the obtaining a possible response of an opposite party under each policy suggestion and simultaneously displaying a solution result of the policy game matrix of the two parties to determine an optimal policy suggestion includes: determining response with optimal effect under a first-stage strategy according to a preset double-matrix game principle; the strategy suggestions, the specific contents of the strategies, the strategy success rate and the time consumption, the strategies at the second stage and the responses which are solved under one or more decision requirements are displayed on a visual interface in a table form; determining the optimal strategy suggestion according to a selection instruction generated by a decision maker based on the information on the visual interface.
An embodiment of a second aspect of the present application provides an unmanned vehicle task allocation gaming device based on a potential field, including: the computing module is used for computing the threat degree and the importance degree of each unit of the security protection two parties based on the current environment situation; the integration module is used for respectively establishing two-party strategy game matrixes under at least one target according to the strategy set of each unit of the two parties of the security protection; and the decision module is used for solving the two-party strategy game matrix until the decision requirement is met, calculating a plurality of strategy suggestions distributed by the multi-stage task, acquiring the possible response of the opposite party under each strategy suggestion, displaying the solving result of the two-party strategy game matrix and determining the optimal strategy suggestion.
Optionally, in an embodiment of the present application, the integrating module includes: the enumeration unit is used for enumerating the policy sets of the units of the security protection parties; the first calculating unit is used for calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of the two strategies in the strategy set; and the generating unit is used for generating a two-party strategy game matrix under the one or more targets.
Optionally, in an embodiment of the present application, the decision module includes: the preprocessing unit is used for preprocessing the two-party strategy game matrix; and the second computing unit is used for solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.
Optionally, in an embodiment of the present application, the preprocessing unit includes: the merging unit is used for carrying out normalization processing on each game matrix and merging each game matrix according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems; and the conversion unit is used for converting the double-matrix game model into a quadratic programming model according to the relation between the game and the mathematical programming based on a plurality of groups of double-matrix game problems to obtain the preprocessed game problems solved by the differential evolution algorithm.
Optionally, in an embodiment of the present application, the decision module is further configured to: and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.
Optionally, in an embodiment of the present application, the decision module further includes: the response unit is used for determining the response with the optimal effect under the first-stage strategy according to a preset double-matrix game principle; the visualization unit is used for displaying the strategy suggestions, the specific contents of the strategies, the success rate and the time consumption of the strategies, the strategies at the second stage and the responses which are solved under one or more decision requirements on a visualization interface in a tabular form; and the decision unit is used for determining the optimal strategy suggestion according to a selection instruction generated by a decision maker based on the information on the visual interface.
An embodiment of a third aspect of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the unmanned vehicle task allocation gaming method based on the potential field.
A fourth aspect embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, the program being executed by a processor for implementing the potential field based unmanned vehicle mission allocation gaming method according to any of claims 1-6.
The embodiment of the application establishes the potential field based on the overall situation analysis of the environment, and then calculates the strategy success rate, so that one party of security can make the best response to the strategy of the other party, and provides various strategy selections for decision makers, the action of the decision makers is fully reflected, the whole time consumption is balanced, the resource consumption can be saved, and the practicability is good. Therefore, the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security policy, the function of a decision maker cannot be fully reflected, time is consumed for tasks, resource consumption is high and the like are solved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a method for unmanned vehicle task allocation gaming based on a potential field according to an embodiment of the present application;
fig. 2 is a flowchart of a method for an unmanned vehicle mission allocation gaming based on a potential field according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a method for unmanned vehicle mission allocation gaming based on a potential field according to an embodiment of the present application;
FIG. 4 is a diagram of a neural network architecture for calculating threat and importance from a situation field, according to an embodiment of the present application;
FIG. 5 is a flow chart of RRT for post-pruning according to an embodiment of the present application;
fig. 6 is a diagram of an RRT planned path in a setting scenario according to an embodiment of the present application;
fig. 7 is a partial view of a game matrix of party a corresponding to a strategy success rate in a setting scenario according to an embodiment of the present application;
FIG. 8 is a flow chart of a differential evolution algorithm provided in accordance with an embodiment of the present application;
FIG. 9 is a schematic diagram illustrating a differential evolution optimization progress according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an unmanned vehicle task allocation gaming device based on a potential field according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The method and the device for unmanned vehicle task allocation gaming based on the potential field according to the embodiment of the application are described below with reference to the accompanying drawings. Aiming at the problems that the analysis effect of the overall situation of the related technology mentioned in the background technology center is poor, reasonable response cannot be made to the security strategy of the other party, the effect of a decision maker cannot be fully reflected, the task consumes time and the resource consumption is high, the application provides the unmanned vehicle task allocation game method based on the situation field. Therefore, the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security policy, the function of a decision maker cannot be fully reflected, time is consumed for tasks, resource consumption is high and the like are solved.
Specifically, fig. 1 is a schematic flow chart of an unmanned vehicle task allocation gaming method based on a potential field according to an embodiment of the present application.
As shown in fig. 1, the unmanned vehicle task allocation gaming method based on the potential field includes the following steps:
in step S101, the threat level and the importance level of each unit of both security guards are calculated based on the current environmental situation.
Specifically, in the embodiment of the present application, based on the environmental potential field, the threat level and the importance level of each unit of the two security parties can be calculated, and the logical expression thereof is as follows:
{{T},{V}}=f(S),
wherein S is an environment state potential field, and S belongs to R2The form of the method is a matrix obtained after map rasterization, and the numerical value of each point of the matrix is the situation field force of the corresponding scene coordinate point. { T } is a threat degree set of each unit of both security sides, { V } is an importance degree set of each unit of both security sides, and f is a mapping relation of threat degree and importance degree obtained according to situation fields.
In particular implementation, mapping from the state potential field to the degree of threat and importance may employ a rule-based approach or a learning-based approach, etc. If the rule-based approach is based, the following expression may be referenced:
Figure BDA0003495659810000051
Figure BDA0003495659810000052
wherein, tjThreat degree in units of j, EjIn units of the situation field force at j, EiIs the state field force at the other unit i of the unit j, and n is the total number of units of the other unit of the unit j. VjIs the degree of importance of the unit j, CjFor the purpose of being based on the potential field, the relative similarity of the unit j and the optimal unit is calculated by applying an Analytic Hierarchy Process (AHP) and a distance to solution (TOPSIS).
If the threat degree and the importance degree are calculated by adopting a learning-based method, the situation field matrix can be input into a deep learning neural network, and the threat degree and the importance degree of the two units can be calculated through operations such as convolution, pooling and the like, wherein the structure of the threat degree and the importance degree is shown in fig. 4.
For example, in the embodiment of the present application, a rule-based method may be adopted to set a scene in which both have three units, calculate the threat degree and importance degree of both units from the state potential field, and obtain both unit information as shown in table 1, where table 1 is both unit information in the set scene.
TABLE 1
Figure BDA0003495659810000061
The embodiment of the application can be when formulating unmanned aerial vehicle security strategy, from whole situation analysis, can be according to both sides units' threat degree and importance degree, and then form the best strategy, can effectively guarantee the rationality and the reliability of unmanned vehicle task allocation game, satisfy the user demand, guarantee to use and experience.
In step S102, a policy game matrix of both parties under at least one target is respectively established according to a policy set of each unit of both parties of the security protection.
In the actual execution process, the embodiment of the application can enumerate the strategy sets of the two parties according to the two security parties and establish the strategy game matrix of the two parties under at least one target. At least one of the objectives may be success rate, time consumption and resource consumption, and its specific algorithm will be described in detail below. According to the embodiment of the application, the strategy sets are respectively enumerated according to the security protection parties, and the strategy game matrix is established, so that the accuracy and the applicability of the strategy result provided by the strategy game matrix are ensured, and a decision maker can perform optimal selection.
Optionally, in an embodiment of the present application, respectively establishing a two-party policy gaming matrix under at least one target includes: listing a strategy set of each unit of both security parties; calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of two parties in the strategy set; and generating a two-party strategy game matrix under one or more targets.
As one way to implement this, when the policy sets of both parties are listed in the embodiment of the present application, the unit number of the a-party is set to M, the unit number of the B-party is set to N, and it is assumed that one unit can only be allocated to one unit. There are (+1) optional tasks per unit on party a, i.e., assigning to any unit on party B, or not assigning a task). Therefore, the strategy available for the A party is (N +1)MIt is expressed as:
Figure BDA0003495659810000063
wherein an arbitrary strategy αuFor an M-element array, each element in the array is defined as:
Figure BDA0003495659810000062
policy definitions for party B are similar, sharing (M +1)NIt is recorded as:
Figure BDA0003495659810000064
wherein, in the setting scene, both sides have 3481 alternative strategies.
In the embodiment of the present application, a method for calculating one or more targets of success rate, expected time consumption, and expected resource consumption of each policy of two parties in a policy set is as follows:
further, when calculating the success rate of each policy of both parties, in the embodiment of the present application, when both parties have determined the task allocation policy, the success rate of both parties calculated by the policies of both parties is defined as follows:
Figure BDA0003495659810000071
Pb=-Pa
wherein, PaSuccess rate of A square,PbIn order to achieve the success rate of the B party,
Figure BDA0003495659810000072
is the ith unit threat degree of the A side,
Figure BDA0003495659810000073
is the j th unit threat degree of the B side,
Figure BDA0003495659810000074
for the ith unit importance of party a,
Figure BDA0003495659810000075
is the jth unit importance of the B-side, xijIs defined as:
Figure BDA0003495659810000076
yijand the indexes are corresponding indexes in the strategy of the B party.
When the predicted time consumption of each strategy of the two parties is calculated, the time consumption of the task is composed of two parts: static time, i.e. the time required for task execution; dynamic time, i.e. the time required for each unit to go to the corresponding unit.
In the static time part, according to the priori knowledge, the static time can be set as a fixed value t in the embodiment of the application0
In the dynamic time section, a situation risk threshold value δ may be set in the embodiment of the present application, and a region in which the risk in the situation field is higher than the threshold value and an obstacle region detected by a sensor are combined to be defined as an impassable region. For the passable area, a fast-expanding random tree (RRT) algorithm of post-pruning is adopted for path planning, and an algorithm flow chart and specific steps are shown in fig. 5:
first, initialize the path tree and get the starting point qinitAdded to the tree and randomly sampled in space at a point qrand(ii) a Judging qrandIf the current position is not in the passable area, returning to re-sampling, and if the current position is in the passable area, continuing to sample; finding the distance q in the current treerandNearest point qnear(ii) a In fixed steps fromqnearTo qrandExpanding one step to obtain a new node qnew(ii) a Judging the expansion edge qnearqnewWhether the node will pass through the impassable area or not, if not, the new node q is connectednewAdding to the tree, and if so, discarding the node; computing a new node qnewTo the planned end point q (if any)goalIf the distance is smaller than the set value, the planning is considered to be finished, otherwise, the step S502 is returned to continue the circulation; from the distance end point qgoalThe nearest tree nodes start reverse backtracking in sequence until the starting point qinitSo as to obtain a passable path connecting the starting point and the end point; finally, post pruning is introduced, namely whether a straight line formed by connecting two points at a certain distance in the passable path point sequence passes through the impassable area or not is judged, if not, the path point between the two points can be deleted, and the two points are directly connected to ensure that the path is smoother.
After the steps are completed, the passable path between any two units can be obtained, the linear distances between adjacent path points are sequentially calculated, the total distance of the path is obtained through summation, and the total distance is divided by the speed of the corresponding unit, namely the dynamic time required by movement. Wherein, the time required for the A-side unit i to go to the B-side unit j in the strategy is recorded as tijThe time required by each policy of party a can be expressed as:
tsum=max{tij}+t0;i=1,2,…,M;j=1,2,…,N.。
the time consumption of the B-party strategy can be calculated according to similar ideas.
For example, in the setting scenario of the embodiment of the present application, t may be set0The situation risk threshold δ is 7 at 100s, and the path planned using the RRT algorithm of post-pruning is shown in fig. 6: in the figure, the horizontal and vertical axes are actual position coordinates in the environment, the rectangle is an area where the obstacle and the situation risk are higher than the threshold value, the lower circle is three units of the a side, the upper circle is three units of the B side, and the curve is a path drawn by the RRT from each unit of the a side to each unit of the B side. It can be seen that although the path planned by the algorithm has broken lines, the path may not meet the vehicle kinematic constraint, but the length of the path is close to that of the vehicle kinematic constraintIn practical situations, the method has higher reference value and is more reasonable for estimating the moving time.
When calculating the predicted resource consumption of each policy of both parties, the embodiment of the present application may give a predicted resource consumption calculation formula in consideration of the number of units and their respective importance levels put into each policy as follows:
Figure BDA0003495659810000081
Figure BDA0003495659810000082
wherein, each variable is defined as: raFor resource consumption of party A, ReIn order to achieve the resource consumption of the B-party,
Figure BDA0003495659810000083
for the ith unit importance of party a,
Figure BDA0003495659810000084
is the jth unit importance of the B-side, xijIs defined as:
Figure BDA0003495659810000085
yijand the indexes are corresponding indexes in the strategy of the B party.
By applying the above steps in the setting scenario, two 81 × 81 dimensional matrices can be obtained under three targets, and the strategy success rate matrix of the a-side is taken as an example, and its part is shown in fig. 7.
According to the embodiment of the application, the accuracy and the applicability of the provided strategy result are ensured by performing multi-target joint analysis, and a decision maker can perform optimal selection.
In step S103, the two-party policy game matrix is solved until the decision requirement is satisfied, a plurality of policy suggestions for multi-stage task allocation are calculated, and the solving result of the two-party policy game matrix is displayed while the possible response of the opposite party under each policy suggestion is obtained, so as to determine the optimal policy suggestion.
In the actual execution process, the strategy game matrixes of the two parties can be solved until the decision requirements are met, and a plurality of strategy suggestions for multi-stage task distribution are calculated. Meanwhile, after the possible response of the opposite party under each strategy suggestion is obtained, the solving result of the strategy game matrixes of the two parties can be displayed, and the best strategy suggestion is further determined.
Optionally, in an embodiment of the present application, solving the two-party policy game matrix includes: preprocessing the strategy game matrixes of the two parties; and solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.
For example, the method for solving the two-party policy game matrix in the embodiment of the present application includes: firstly, preprocessing a strategy game matrix of two parties in the embodiment of the application; secondly, the embodiment of the application can utilize a differential evolution algorithm to solve the preprocessed two-party strategy game matrix.
Optionally, in an embodiment of the present application, the preprocessing the two-party policy gaming matrix includes: carrying out normalization processing on each game matrix, and combining each game matrix according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems; based on a plurality of groups of double-matrix game problems, converting a double-matrix game model into a secondary planning model according to the relation between the game and the mathematical planning, and obtaining a preprocessed game problem solved by a differential evolution algorithm.
It can be understood that, when the embodiment of the application is used for preprocessing, each game matrix can be subjected to normalization processing, each game matrix is combined according to different weights according to target requirements, and then a plurality of groups of double-matrix game problems are obtained, and based on the plurality of groups of double-matrix game problems, the double-matrix game model is converted into a quadratic programming model according to the relation between games and mathematical programming, so that the preprocessed game problem solved by a differential evolution algorithm is obtained.
Firstly, the embodiments of the present application can use a range variation method to respectively normalize the game matrix under each target (success rate, task time consumption, resource consumption)Change the value a of the ith row and the jth column in each matrixijAfter normalization, the method comprises the following steps:
Figure BDA0003495659810000091
wherein, aminIs the minimum value in the matrix, amaxIs the maximum value in the matrix. Each value in the matrix after mapping is between 0 and 1, so that magnitudes of values in different target matrices are consistent, and subsequent weighting and combining are facilitated.
Secondly, the game matrixes can be combined according to different weights according to requirements, and a plurality of groups of independent double-matrix games are formed. According to the embodiment of the application, three requirements of high success rate, short consumed time and a balance strategy can be set, and the weights of all targets under different requirements can be set as:
the success rate is higher: merging matrix is 0.9 × success rate +0.05 × time consumption +0.05 × resource;
the time consumption is short: merging matrix is 0.7 × success rate +0.25 × time consumption +0.05 × resource;
and (3) balancing strategy: the merge matrix is 0.8 × success rate +0.15 × time consumption +0.05 × resources.
Since the success rate is always the first element, the embodiment of the present application may set the success rate weight to the maximum. In addition, matrix modeling is not required to be repeated for different requirements, and only solution is needed after respective summation.
Further, according to the embodiment of the application, the double-matrix game model can be converted into a quadratic programming model according to the relation between the game and the mathematical programming. The embodiment of the application can set the number of the strategies of the two parties to be M and n respectively, and the game matrixes of the two parties to be M (a)ij(,N(bij(then, the quadratic programming problem obtained after transformation is expressed as:
Figure BDA0003495659810000092
Figure BDA0003495659810000101
wherein x is (x)1,x2,…,xm),y=(y1,y2,…,yn) Respectively the probability of two parties selecting each strategy in the mixed strategy solution, v1,v2Are two variables introduced. The objective function value of the quadratic programming problem is always no greater than zero, and if and only if the game problem obtains an optimal solution, the objective function value is zero.
Finally, the embodiment of the application can solve the preprocessed game problem by using a differential evolution algorithm, wherein the differential evolution algorithm is one of genetic algorithms, and is characterized by high convergence rate, simple implementation and suitability for real number coding instead of binary coding. In the embodiment of the present application, the population of the differential evolution is a set of mixed strategy solutions of the game problem, where each independent variable is the probability of choosing a certain strategy. The algorithm flow chart and the specific steps are shown in FIG. 8:
step S801: and (5) initializing a population. A set of solutions for the model that needs to be optimized is randomly generated.
Step S802: and calculating the fitness. And substituting the solutions into the objective function to judge the effect of each solution.
Step S803: and (6) judging to be terminated. Namely, whether an optimal solution is obtained or whether a specified iteration number is reached is judged, if yes, the optimal solution under the existing condition is given, and if not, circulation is continued.
Step S804: and (4) selecting. And selecting a part of solutions according to population fitness to be reserved.
Step S805: and (4) crossing. And exchanging the reserved solutions pairwise according to a certain probability.
Step S806: and (5) carrying out mutation. Each group of solutions is mutated through a differential strategy, and the formula is as follows:
vi(g+1)=xbest(g)+F·(xr1(g)-xr2(g)),
wherein v isi(g +1) is a new set of solutions obtained after mutation, xbest(r) is the existing optimal solution, xr1(r),xr2(g) F is a variation factor, which is the original two solutions at random.
Finally, the steps S802 to S806 are looped until the termination condition is satisfied.
The embodiment of the application plans the path between the two units by using the RRT, and calculates the time required by going to the unit according to the path, so that the result is more reasonable, and the consumption of time and resources is less.
Optionally, in an embodiment of the present application, solving the two-party policy game matrix includes: and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.
As a possible implementation manner, the embodiment of the present application may display the change of the objective function value along with the iteration number in a form of a line graph in real time, and give a current optimization progress for reference of a decision maker. If the decision maker considers that the current solving progress meets the requirement, a 'stopping solving' instruction can be issued at any time, and the device gives the optimal solution under the current progress. The progress in the optimization process under the set scenario is shown in fig. 9. According to the embodiment of the application, the action of a decision maker can be fully embodied, the solution can be stopped at any time according to the requirement of the decision maker in the solution process, and the balance between the strategy optimality and the solution time can be achieved.
Optionally, in an embodiment of the present application, the method for determining the best policy suggestion by showing the solution result of the policy game matrix of both parties while obtaining the possible response of the other party under each policy suggestion includes: determining response with optimal effect under a first-stage strategy according to a preset double-matrix game principle; the strategy suggestions, the specific contents of the strategies, the success rate and the time consumption of the strategies, the strategies at the second stage and the responses which are solved under one or more decision requirements are displayed on a visual interface in a tabular form; and determining the optimal strategy suggestion according to a selection instruction generated by a decision maker based on the information on the visual interface.
In the actual execution process, after the differential evolution solution is completed, a multi-stage task allocation mechanism can be introduced, and second-stage allocation is performed on the two residual units after the first-stage task allocation execution is completedSo that the method is more suitable for the requirements of practical application. Firstly, setting an effect threshold value of an A-party strategy
Figure BDA0003495659810000111
And calculating the actual effect gamma of the A-party strategy, wherein the calculation formula is as follows:
Figure BDA0003495659810000112
wherein the content of the first and second substances,
Figure BDA0003495659810000113
is the ith unit threat degree of the A side,
Figure BDA0003495659810000114
is the jth unit importance of the B-side, xijIs defined as follows:
Figure BDA0003495659810000115
the embodiment of the application can be used for comparing the actual effect gamma with the effect threshold value
Figure BDA0003495659810000116
Contrast, if
Figure BDA0003495659810000117
The strategy of the first stage is considered to have no sufficient effect on the B party, so that the planning of the second stage is needed, and a round of task distribution is developed.
Before the second stage task allocation calculation is carried out, the damage degree of each unit of the two parties needs to be calculated according to the strategies of the two parties in the first stage, and whether the two parties can participate in the second stage task is judged. Here, the damage degree threshold per unit is set as
Figure BDA00034956598100001110
Actual damage degree eta of jth unitjThe calculation formula is as follows:
Figure BDA0003495659810000118
wherein the variables are defined as in the calculation formula.
If it is
Figure BDA0003495659810000119
The unit is considered to be damaged too much after the task of the first stage is completed, and the task of the second stage cannot be executed any more, so that the unit is excluded.
Next, in the embodiment of the present application, the policy set of the second stage of the a-party may be listed by using the related method in the above steps according to the filtered list of the remaining units of the second stage of the two parties. In order to ensure the real-time performance as much as possible and prevent the high uncertainty of long-term prediction, the task allocation in the second stage only considers the strategy of the party A, does not use a game model any more, and does not consider the possible response of the party B.
Meanwhile, due to the uncertainty of the positions of the two parties after the first-stage task is completed, the time and resource consumption of the strategy are not considered any more in the second-stage task allocation, and only the success rate is taken as the only target, and after the strategy set of the A party is listed, the strategy with the highest success rate is directly selected to be used as the strategy suggestion of the second stage. The design can also accelerate the solving speed.
Further, after the strategy of the party A is listed, possible responses of the party B under each strategy can be given, and the solving results under different strategy targets are displayed in a list form for decision makers to select.
Specifically, according to the double-matrix game principle, the method and the device can give the response of the B party with the optimal effect under the first-stage strategy of the A party. If the strategy serial number selected by the party A is i0The response strategy sequence number j of the B party0The following relationship is satisfied in the B-party gambling matrix N:
j0=arg maxj N[i0,j],
and information such as strategy suggestions, strategy specific contents, strategy success rate and time consumption, second-stage strategies, predicted B party responses and the like solved by the A party under the three requirements of high success rate, short time consumption and balanced strategies is displayed on a visual interface of the device in a tabular form for reference and selection of decision makers. The policy suggestion table in the setting scenario is shown in table 2, where table 2 is the policy suggestion table.
TABLE 2
Figure BDA0003495659810000121
In the above table, in the two columns of "a-party allocation proposal" and "first-stage expected B-party response", the units before the arrow are allocated to the units after the arrow in the policy. If a unit in the policy points to 0, it is stated that the unit in the policy does not allocate the task, and it is able to prevent the units executing the task from being too many, resulting in too much resource consumption.
A specific embodiment of the present application will be described in detail with reference to fig. 2 to 9.
As shown in fig. 2, a specific embodiment of the present application includes the following steps:
s201: based on the environment state potential field, the threat degree and the importance degree of each unit of both the security protection parties are calculated. The logic expression is as follows:
{{T},{V}}=f(S),
wherein S is an environment state potential field, and S belongs to R2The form of the method is a matrix obtained after map rasterization, and the numerical value of each point of the matrix is the situation field force of the corresponding scene coordinate point. { T } is a threat degree set of each unit of both security sides, { V } is an importance degree set of each unit of both security sides, and f is a mapping relation of threat degree and importance degree obtained according to situation fields.
In particular implementation, the mapping from the state potential field to the threat degree and the importance degree can adopt a rule-based method or a learning-based method and the like. If the rule-based approach is based, the following expression may be referenced:
Figure BDA0003495659810000131
Figure BDA0003495659810000132
wherein, tjThreat degree in units of j, EjIn units of the situation field force at j, EiIs the state field force at the other unit i of the unit j, and n is the total number of units of the other unit of the unit j. VjIs the degree of importance of the unit j, CjFor the purpose of being based on the state potential field, the relative similarity of the unit j and the optimal unit is calculated by using Analytic Hierarchy Process (AHP) and distance to solution (TOPSIS).
If the threat degree and the importance degree are calculated by adopting a learning-based method, the situation field matrix can be input into a deep learning neural network, and the threat degree and the importance degree of the two units can be calculated through operations such as convolution, pooling and the like, wherein the structure of the threat degree and the importance degree is shown in fig. 4.
In the embodiment of the application, a rule-based method can be adopted, a scene with three units for both parties is set, and the threat degree and the importance degree of the units for both parties are calculated according to the state potential field to obtain the unit information of both parties.
S202: enumerating the strategy sets of the two parties, and establishing a strategy game matrix of the two parties under the three targets of success rate, time consumption and resource consumption. The unit number of the A side is set as M, the unit number of the B side is set as N, and one unit can be only distributed to one unit. There are (+1) optional tasks per unit on party a, i.e., assigning to any unit on party B, or not assigning a task). Therefore, the strategy available for the A party is (N +1)MIt is expressed as:
Figure BDA0003495659810000137
wherein an arbitrary strategy αuFor an M-element array, each element in the array is defined as:
Figure BDA0003495659810000133
of party BPolicy definitions are similar, in common (M +1)NIt is recorded as:
Figure BDA0003495659810000138
wherein, in the setting scene, both sides have 3481 alternative strategies.
When the success rate of each policy of both parties is calculated in the embodiment of the present application, when both parties have determined the task allocation policy, the success rate of both parties calculated by the policies of both parties is defined as follows:
Figure BDA0003495659810000134
Pb=-Pa
wherein, PaFor A-party success rate, PbIn order to achieve the success rate of the B party,
Figure BDA0003495659810000135
is the ith unit threat degree of the A side,
Figure BDA0003495659810000136
is the j th unit threat degree of the B side,
Figure BDA0003495659810000141
for the ith unit importance of party a,
Figure BDA0003495659810000142
is the jth unit importance of BijIs defined as:
Figure BDA0003495659810000143
yijand the indexes are corresponding indexes in the strategy of the B party.
When the predicted time consumption of each strategy of the two parties is calculated, the time consumption of the task is composed of two parts: static time, i.e. the time required for task execution; dynamic time, i.e. the time required for each unit to go to the corresponding unit.
In the static time part, according to the priori knowledge, the static time can be set as a fixed value t in the embodiment of the application0
In the dynamic time section, a situation risk threshold value δ may be set in the embodiment of the present application, and a region in which the risk in the situation field is higher than the threshold value and an obstacle region detected by a sensor are combined to be defined as an impassable region. For the passable area, a fast-expanding random tree (RRT) algorithm of post-pruning is adopted for path planning, and an algorithm flow chart and specific steps are shown in fig. 5:
first, initialize the path tree and get the starting point qinitAdded to the tree and randomly sampled in space at a point qrand(ii) a Judging qrandIf the current position is not in the passable area, returning to re-sampling, and if the current position is in the passable area, continuing to sample; finding the distance q in the current treerandNearest point qnear(ii) a With fixed step size from qnearTo qrandExpanding one step to obtain a new node qnew(ii) a Judging the expansion edge qnearqnewWhether the node will pass through the impassable area or not, if not, the new node q is connectednewAdding to the tree, and if so, discarding the node; computing a new node qnewTo the planned end point q (if any)goalIf the distance is smaller than the set value, the planning is considered to be finished, otherwise, the step S502 is returned to continue the circulation; from the distance end point qgoalThe nearest tree nodes start reverse backtracking in sequence until the starting point qinitSo as to obtain a passable path connecting the starting point and the end point; finally, post pruning is introduced, namely whether a straight line formed by connecting two points at a certain distance in the passable path point sequence passes through the impassable area or not is judged, if not, the path point between the two points can be deleted, and the two points are directly connected to ensure that the path is smoother.
After the steps are completed, the passable path between any two units can be obtained, the linear distances between adjacent path points are sequentially calculated, the total distance of the path is obtained through summation, and the total distance is divided by the speed of the corresponding unit, namely the dynamic time required by movement. Wherein, the A side in the strategy is put in a unitThe time required for the i to go to the B unit j is denoted tijThe time required by each policy of party a can be expressed as:
tsum=max{tij}+t0;i=1,2,…,M;j=1,2,…,N.。
the time consumption of the B-party strategy can be calculated according to similar ideas.
In the setting scenario, let t0The situation risk threshold δ is 7 at 100s, and the path planned using the RRT algorithm of post-pruning is shown in fig. 6: in the figure, the horizontal and vertical axes are actual position coordinates in the environment, the rectangle is an area where the obstacle and the situation risk are higher than the threshold value, the lower circle is three units of the a side, the upper circle is three units of the B side, and the curve is a path drawn by the RRT from each unit of the a side to each unit of the B side. It can be seen that although the path planned by the algorithm has a broken line, the vehicle kinematic constraint may not be satisfied, but the length of the path is close to the actual condition, and the path has a higher reference value and is more reasonable to be used for estimating the moving time.
When calculating the predicted resource consumption of each policy of both parties, the embodiment of the present application may give a predicted resource consumption calculation formula in consideration of the number of units and their respective importance levels put into each policy as follows:
Figure BDA0003495659810000151
Figure BDA0003495659810000152
wherein, each variable is defined as: raFor resource consumption of party A, ReIn order to achieve the resource consumption of the B-party,
Figure BDA0003495659810000153
for the ith unit importance of party a,
Figure BDA0003495659810000154
is the jth unit importance of the B-side, xijIs defined as:
Figure BDA0003495659810000155
yijand the indexes are corresponding indexes in the strategy of the B party.
By applying the above steps in the setting scenario, two 81 × 81 dimensional matrices can be obtained under three targets, and the strategy success rate matrix of the a-side is taken as an example, and its part is shown in fig. 7.
S203: and (4) preprocessing the game matrix such as simplification and conversion, and optimally solving by adopting a differential evolution algorithm. Firstly, the game matrixes under each target (success rate, task time consumption and resource consumption) can be respectively normalized by using a range variation method in the embodiment of the application, and the numerical value a of the ith row and the jth column in each matrixijAfter normalization, the method comprises the following steps:
Figure BDA0003495659810000156
wherein, aminIs the minimum value in the matrix, amaxIs the maximum value in the matrix. Each value in the matrix after mapping is between 0 and 1, so that magnitudes of values in different target matrices are consistent, and subsequent weighting and combining are facilitated.
Secondly, the game matrixes can be combined according to different weights according to requirements, and a plurality of groups of independent double-matrix games are formed. According to the embodiment of the application, three requirements of high success rate, short consumed time and a balance strategy can be set, and the weights of all targets under different requirements can be set as:
the success rate is higher: merging matrix is 0.9 × success rate +0.05 × time consumption +0.05 × resource;
the time consumption is short: merging matrix is 0.7 × success rate +0.25 × time consumption +0.05 × resource;
and (3) balancing strategy: the merge matrix is 0.8 × success rate +0.15 × time consumption +0.05 × resources.
Since the success rate is always the first element, the embodiment of the present application may set the success rate weight to the maximum. In addition, matrix modeling is not required to be repeated for different requirements, and only solution is needed after respective summation.
Further, according to the embodiment of the application, the double-matrix game model can be converted into a quadratic programming model according to the relation between the game and the mathematical programming. The embodiment of the application can set the number of the strategies of the two parties to be M and n respectively, and the game matrixes of the two parties to be M (a)ij),N(bij) Then the quadratic programming problem obtained after transformation is expressed as:
Figure BDA0003495659810000157
Figure BDA0003495659810000161
wherein x is (x)1,x2,…,xm),y=(y1,y2,…,yn) Respectively the probability of two parties selecting each strategy in the mixed strategy solution, v1,v2Are two variables introduced. The objective function value of the quadratic programming problem is always no greater than zero, and if and only if the game problem obtains an optimal solution, the objective function value is zero.
Finally, the embodiment of the application can solve the preprocessed game problem by using a differential evolution algorithm, wherein the differential evolution algorithm is one of genetic algorithms, and is characterized by high convergence rate, simple implementation and suitability for real number coding instead of binary coding. In the embodiment of the present application, the population of the differential evolution is a set of mixed strategy solutions of the game problem, where each independent variable is the probability of choosing a certain strategy. The algorithm flow chart and the specific steps are shown in FIG. 8:
step S801: and (4) initializing a population. A set of solutions for the model that needs to be optimized is randomly generated.
Step S802: and calculating the fitness. And substituting the solutions into the objective function to judge the effect of each solution.
Step S803: and (6) judging to be terminated. Namely, whether an optimal solution is obtained or whether a specified iteration number is reached is judged, if yes, the optimal solution under the existing condition is given, and if not, circulation is continued.
Step S804: and (4) selecting. And selecting a part of solutions according to population fitness to be reserved.
Step S805: and (4) crossing. And exchanging the reserved solutions pairwise according to a certain probability.
Step S806: and (5) carrying out mutation. Each group of solutions is mutated through a differential strategy, and the formula is as follows:
vi(g+1)=xbest(g)+F·(xr1(g)-xr2(g)),
wherein v isi(g +1) is a new set of solutions obtained after mutation, xbest(r) is the existing optimal solution, xr1(r),xr2(g) F is a variation factor, which is the original two solutions at random.
And looping the steps S802 to S806 until the termination condition is met.
S204: and displaying the solving progress in real time, stopping solving at any time according to the requirements of a decision maker, and calculating the strategy suggestion of multi-stage task allocation. The embodiment of the application can display the change of the objective function value along with the iteration times in real time in the form of a line graph, and provides the current optimization progress for a decision maker to refer to. If the decision maker considers that the current solving progress meets the requirement, a 'stopping solving' instruction can be issued at any time, and the device gives the optimal solution under the current progress. The progress in the optimization process under the set scenario is shown in fig. 9.
S205: and giving out the possible response of the B party under each suggested strategy, and displaying the solution results under the different strategy targets in a list form for a decision maker to select. After the differential evolution solution is completed, a multi-stage task allocation mechanism can be introduced, and second-stage allocation is performed on the two residual units after the first task allocation is completed, so that the requirements of practical application are met. Firstly, setting an effect threshold value of an A-party strategy
Figure BDA0003495659810000162
And calculating the actual effect gamma of the A-party strategy, wherein the calculation formula is as follows:
Figure BDA0003495659810000171
wherein the content of the first and second substances,
Figure BDA0003495659810000172
is the ith unit threat degree of the A side,
Figure BDA0003495659810000173
is the jth unit importance of the B-side, xijIs defined as:
Figure BDA0003495659810000174
the embodiment of the application can be used for comparing the actual effect gamma with the effect threshold value
Figure BDA0003495659810000175
In contrast, if
Figure BDA0003495659810000176
The strategy of the first stage is considered to have no sufficient effect on the B party, so that the planning of the second stage is needed, and a round of task distribution is developed.
Before the second stage task allocation calculation is carried out, the damage degree of each unit of the two parties needs to be calculated according to the strategies of the two parties in the first stage, and whether the two parties can participate in the second stage task or not needs to be judged. Here, the damage degree threshold per unit is set as
Figure BDA0003495659810000179
Actual damage degree eta of jth unitjThe calculation formula is as follows:
Figure BDA0003495659810000177
wherein the variables are defined as in the calculation formula.
If it is
Figure BDA0003495659810000178
The unit is considered to be damaged too much after the task of the first stage is completed, and the task of the second stage cannot be executed any more, so that the unit is excluded.
Next, in the embodiment of the present application, the policy set of the second stage of the a-party may be listed by using the related method in the above steps according to the filtered list of the remaining units of the second stage of the two parties. In order to ensure the real-time performance as much as possible and prevent the high uncertainty of long-term prediction, the task allocation in the second stage only considers the strategy of the party A, does not use a game model any more, and does not consider the possible response of the party B.
Meanwhile, due to the uncertainty of the positions of the two parties after the first-stage task is completed, the time and resource consumption of the strategy are not considered any more in the second-stage task allocation, and only the success rate is taken as the only target, and after the strategy set of the A party is listed, the strategy with the highest success rate is directly selected to be used as the strategy suggestion of the second stage. The design can also accelerate the solving speed.
Further, after the strategy of the party A is listed, possible responses of the party B under each strategy can be given, and the solving results under different strategy targets are displayed in a list form for decision makers to select.
Specifically, according to the double-matrix game principle, the method and the device can give the response of the B party with the optimal effect under the first-stage strategy of the A party. If the strategy serial number selected by the party A is i0The response strategy sequence number j of the B party0The following relationship is satisfied in the B-party gambling matrix N:
j0=arg maxj N[i0,j],
and information such as strategy suggestions, strategy specific contents, strategy success rate and time consumption, second-stage strategies, predicted B-party responses and the like which are solved by the A-party under the three requirements of high success rate, short time consumption and balanced strategies is displayed on a visual interface of the device in a tabular form for reference and selection of decision makers.
According to the unmanned vehicle task allocation game method based on the potential field, the potential field is established based on the overall situation analysis of the environment, the strategy success rate is further calculated, so that one party of security can make the best response to the strategy of the other party, various strategy selections are provided for decision makers, the functions of the decision makers are fully reflected, the overall time consumption is balanced, the resource loss can be saved, and the practicability is good. Therefore, the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security policy, the function of a decision maker cannot be fully reflected, time is consumed for tasks, resource consumption is high and the like are solved.
The unmanned vehicle task allocation gaming device based on the potential field is described next with reference to the attached drawings.
Fig. 10 is a block schematic diagram of an unmanned vehicle mission allocation gaming device based on a state potential field according to an embodiment of the present application.
As shown in fig. 10, the potential field based unmanned vehicle task dispensing gaming device 10 includes: a calculation module 100, an integration module 200 and a decision module 300.
Specifically, the calculating module 100 is configured to calculate the threat degree and the importance degree of each unit of the two security protection parties based on the current environment situation.
And the integrating module 200 is configured to respectively establish a policy game matrix of both parties under at least one target according to a policy set of each unit of both parties of the security protection.
The decision module 300 is configured to solve the two-party policy game matrix until a decision requirement is met, calculate a plurality of policy suggestions for multi-stage task allocation, obtain a possible response of the opposite party under each policy suggestion, display a solution result of the two-party policy game matrix, and determine an optimal policy suggestion.
Optionally, in an embodiment of the present application, the integrating module 200 includes: the device comprises an enumeration unit, a first calculation unit and a generation unit.
The enumeration unit is used for enumerating policy sets of each unit of both security protection parties.
The first calculating unit is used for calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of the two strategies in the strategy set.
And the generating unit is used for generating a two-party strategy game matrix under one or more targets.
Optionally, in an embodiment of the present application, the decision module 300 includes: a preprocessing unit and a second computing unit.
The preprocessing unit is used for preprocessing the strategy game matrixes of the two parties.
And the second computing unit is used for solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.
Optionally, in an embodiment of the present application, the preprocessing unit includes: the subunit and the transformant unit were pooled.
The merging subunit is used for carrying out normalization processing on each game matrix and merging each game matrix according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems.
And the converter unit is used for converting the double-matrix game model into a quadratic programming model according to the relation between the game and the mathematical programming based on a plurality of groups of double-matrix game problems to obtain the preprocessed game problem solved by the differential evolution algorithm.
Optionally, in an embodiment of the present application, the decision module 300 is further configured to: and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.
Optionally, in an embodiment of the present application, the decision module 300 further includes: the system comprises a response unit, a visualization unit and a decision unit.
The response unit is used for determining the response with the optimal effect under the first-stage strategy according to a preset double-matrix game principle.
And the visualization unit is used for displaying the strategy suggestions, the strategy specific contents, the strategy success rate and time consumption, the second-stage strategies and the responses which are solved under one or more decision requirements on a visualization interface in a tabular form.
And the decision unit is used for determining the optimal strategy suggestion according to a selection instruction generated by the decision maker based on the information on the visual interface.
It should be noted that the foregoing explanation on the state field-based unmanned vehicle task allocation gaming method embodiment is also applicable to the state field-based unmanned vehicle task allocation gaming device of this embodiment, and details are not repeated here.
According to the unmanned vehicle task allocation gaming device based on the potential field, the potential field is established based on the overall situation analysis of the environment, the strategy success rate is calculated, so that one party of security can make the best response to the strategy of the other party, various strategy selections are provided for decision makers, the functions of the decision makers are fully reflected, the overall time consumption balance is achieved, the resource consumption can be saved, and the practicability is good. Therefore, the problems that the overall situation analysis effect of the related technology is poor, reasonable response cannot be made to another party security policy, the function of a decision maker cannot be fully reflected, time is consumed for tasks, resource consumption is high and the like are solved.
Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
a memory 1101, a processor 1102, and a computer program stored on the memory 1101 and executable on the processor 1102.
The processor 1102, when executing the program, implements the potential field based unmanned vehicle task allocation gaming method provided in the embodiments above.
Further, the electronic device further includes:
a communication interface 1103 for communicating between the memory 1101 and the processor 1102.
A memory 1101 for storing computer programs that are executable on the processor 1102.
The memory 1101 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 1101, the processor 1102 and the communication interface 1103 are implemented independently, the communication interface 1103, the memory 1101 and the processor 1102 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 1101, the processor 1102 and the communication interface 1103 are integrated on one chip, the memory 1101, the processor 1102 and the communication interface 1103 may complete communication with each other through an internal interface.
The processor 1102 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the potential field-based unmanned vehicle mission allocation gaming method as described above.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (14)

1. An unmanned vehicle task allocation gaming method based on a potential field is characterized by comprising the following steps:
calculating the threat degree and the importance degree of each unit of the two security parties based on the current environment situation;
respectively establishing two party strategy game matrixes under at least one target according to the strategy set of each unit of the two parties of the security protection; and
and solving the two-party strategy game matrix until the decision requirement is met, calculating a plurality of strategy suggestions distributed by the multi-stage task, and displaying the solving result of the two-party strategy game matrix while acquiring the possible response of the opposite party under each strategy suggestion to determine the optimal strategy suggestion.
2. The method of claim 1, wherein the respectively establishing a two-party policy gaming matrix under at least one target comprises:
listing the strategy sets of each unit of the security protection two parties;
calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of the two parties in the strategy set;
and generating a two-party strategy game matrix under the one or more targets.
3. The method of claim 1, wherein said solving said two-party policy gambling matrix comprises:
preprocessing the strategy game matrixes of the two parties;
and solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.
4. The method of claim 3, wherein the pre-processing the two-party policy gaming matrix comprises:
carrying out normalization processing on each game matrix, and combining the game matrices according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems;
based on a plurality of groups of double-matrix game problems, converting a double-matrix game model into a quadratic programming model according to the relation between the game and the mathematical programming to obtain the preprocessed game problem solved by the differential evolution algorithm.
5. The method of claim 1, wherein said solving said two-party policy gambling matrix comprises:
and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.
6. The method according to claim 4 or 5, wherein the step of displaying the solution result of the policy game matrix of the two parties while obtaining the possible response of the other party under each policy suggestion and determining the best policy suggestion comprises the steps of:
determining response with optimal effect under a first-stage strategy according to a preset double-matrix game principle;
the strategy suggestions, the specific contents of the strategies, the strategy success rate and the time consumption, the strategies at the second stage and the responses which are solved under one or more decision requirements are displayed on a visual interface in a table form;
and determining the optimal strategy suggestion according to a selection instruction generated by a decision maker based on the information on the visual interface.
7. An unmanned vehicle task allocation gaming device based on a potential field, comprising:
the computing module is used for computing the threat degree and the importance degree of each unit of the security protection two parties based on the current environment situation;
the integration module is used for respectively establishing two-party strategy game matrixes under at least one target according to the strategy set of each unit of the two parties of the security protection; and
and the decision module is used for solving the two-party strategy game matrix until the decision requirement is met, calculating a plurality of strategy suggestions distributed by the multi-stage task, acquiring the possible response of the opposite party under each strategy suggestion, displaying the solving result of the two-party strategy game matrix and determining the optimal strategy suggestion.
8. The apparatus of claim 7, wherein the integrating module comprises:
the enumeration unit is used for enumerating the policy sets of the units of the security protection parties;
the first calculating unit is used for calculating one or more targets of success rate, estimated time consumption and estimated resource consumption of each strategy of the two strategies in the strategy set;
and the generating unit is used for generating a two-party strategy game matrix under the one or more targets.
9. The apparatus of claim 7, wherein the decision module comprises:
the preprocessing unit is used for preprocessing the two-party strategy game matrix;
and the second computing unit is used for solving the preprocessed two-party strategy game matrix by using a differential evolution algorithm.
10. The apparatus of claim 9, wherein the pre-processing unit comprises:
the merging unit is used for carrying out normalization processing on each game matrix and merging each game matrix according to different weights according to target requirements to obtain a plurality of groups of double-matrix game problems;
and the conversion unit is used for converting the double-matrix game model into a quadratic programming model according to the relation between the game and the mathematical programming based on a plurality of groups of double-matrix game problems to obtain the preprocessed game problems solved by the differential evolution algorithm.
11. The apparatus of claim 7, wherein the decision module is further configured to: and displaying the objective function value along with the change of the iteration times in real time in a line graph mode, and displaying the current optimization progress.
12. The apparatus of claim 10 or 11, wherein the decision module further comprises:
the response unit is used for determining the response with the optimal effect under the first-stage strategy according to a preset double-matrix game principle;
the visualization unit is used for displaying the strategy suggestions, the specific contents of the strategies, the success rate and the time consumption of the strategies, the strategies at the second stage and the responses which are solved under one or more decision requirements on a visualization interface in a tabular form;
and the decision unit is used for determining the optimal strategy suggestion according to a selection instruction generated by a decision maker based on the information on the visual interface.
13. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of any one of claims 1-6 for a dynamic field based unmanned vehicle mission allocation gaming.
14. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executable by a processor for implementing a method for a dynamic potential field based unmanned vehicle mission allocation gaming according to any of claims 1-6.
CN202210116279.4A 2022-01-30 2022-01-30 Unmanned vehicle task allocation game method and device based on state potential field Active CN114548409B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210116279.4A CN114548409B (en) 2022-01-30 2022-01-30 Unmanned vehicle task allocation game method and device based on state potential field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210116279.4A CN114548409B (en) 2022-01-30 2022-01-30 Unmanned vehicle task allocation game method and device based on state potential field

Publications (2)

Publication Number Publication Date
CN114548409A true CN114548409A (en) 2022-05-27
CN114548409B CN114548409B (en) 2023-01-10

Family

ID=81672838

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210116279.4A Active CN114548409B (en) 2022-01-30 2022-01-30 Unmanned vehicle task allocation game method and device based on state potential field

Country Status (1)

Country Link
CN (1) CN114548409B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2271047A1 (en) * 2009-06-22 2011-01-05 Deutsche Telekom AG Game theoretic recommendation system and method for security alert dissemination
CN107623697A (en) * 2017-10-11 2018-01-23 北京邮电大学 A kind of network security situation evaluating method based on attacking and defending Stochastic Game Model
CN110119773A (en) * 2019-05-07 2019-08-13 中国科学院自动化研究所 Global Situation Assessment side's method, the system, device of Strategic Games system
CN110443473A (en) * 2019-07-22 2019-11-12 合肥工业大学 Multiple no-manned plane collaboration target assignment method and system under Antagonistic Environment
CN113114492A (en) * 2021-04-01 2021-07-13 哈尔滨理工大学 Security situation perception algorithm based on Markov differential game block chain model
CN113282061A (en) * 2021-04-25 2021-08-20 南京大学 Unmanned aerial vehicle air game countermeasure solving method based on course learning
CN113625740A (en) * 2021-08-27 2021-11-09 北京航空航天大学 Unmanned aerial vehicle air combat game method based on transfer learning pigeon swarm optimization
CN113938394A (en) * 2021-12-14 2022-01-14 清华大学 Monitoring service bandwidth allocation method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2271047A1 (en) * 2009-06-22 2011-01-05 Deutsche Telekom AG Game theoretic recommendation system and method for security alert dissemination
CN107623697A (en) * 2017-10-11 2018-01-23 北京邮电大学 A kind of network security situation evaluating method based on attacking and defending Stochastic Game Model
CN110119773A (en) * 2019-05-07 2019-08-13 中国科学院自动化研究所 Global Situation Assessment side's method, the system, device of Strategic Games system
CN110443473A (en) * 2019-07-22 2019-11-12 合肥工业大学 Multiple no-manned plane collaboration target assignment method and system under Antagonistic Environment
CN113114492A (en) * 2021-04-01 2021-07-13 哈尔滨理工大学 Security situation perception algorithm based on Markov differential game block chain model
CN113282061A (en) * 2021-04-25 2021-08-20 南京大学 Unmanned aerial vehicle air game countermeasure solving method based on course learning
CN113625740A (en) * 2021-08-27 2021-11-09 北京航空航天大学 Unmanned aerial vehicle air combat game method based on transfer learning pigeon swarm optimization
CN113938394A (en) * 2021-12-14 2022-01-14 清华大学 Monitoring service bandwidth allocation method and device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YANGMING KANG ET.AL: "Beyond-Visual-Range Tactical Game Strategy for Multiple UAVs", 《IEEE》 *
YIYANG WANG ET.AL: "Adversarial Online Learning With Variable Plays in the Pursuit-Evasion Game: Theoretical Foundations and Application in Connected and Automated Vehicle Cyber security", 《IEEE ACCESS》 *
王尔申 等: "改进目标收益函数的无人机集群空地对抗模型", 《南京航空航天大学学报》 *
陈子涵 等: "基于Stacklberg-Markov非对等三方博弈模型的移动目标防御技术", 《计算机学报》 *

Also Published As

Publication number Publication date
CN114548409B (en) 2023-01-10

Similar Documents

Publication Publication Date Title
Guimarães et al. The two-echelon multi-depot inventory-routing problem
US20240054444A1 (en) Logistics scheduling method and system for industrial park based on game theory
Mufalli et al. Simultaneous sensor selection and routing of unmanned aerial vehicles for complex mission plans
Han et al. Appointment scheduling and routing optimization of attended home delivery system with random customer behavior
Faratin et al. Using similarity criteria to make issue trade-offs in automated negotiations
Gupta et al. A polynomial goal programming approach for intuitionistic fuzzy portfolio optimization using entropy and higher moments
Cope Bayesian strategies for dynamic pricing in e‐commerce
CN108446978A (en) Handle the method and device of transaction data
Gerding et al. Multi-issue negotiation processes by evolutionary simulation, validation and social extensions
CN112328646B (en) Multitask course recommendation method and device, computer equipment and storage medium
CN110472792A (en) A kind of route optimizing method for logistic distribution vehicle based on discrete bat algorithm
CN110222824B (en) Intelligent algorithm model autonomous generation and evolution method, system and device
WO2022163003A1 (en) Model generation device, estimation device, model generation method, and model generation program
Aras et al. Bilevel models on the competitive facility location problem
CN114548409B (en) Unmanned vehicle task allocation game method and device based on state potential field
CN108650191B (en) Decision method for mapping strategy in virtual network
JP2005536788A (en) How to determine the value given to different parameters of the system
Djenadi et al. Energy-aware task allocation strategy for multi robot system
Asmuni et al. A novel fuzzy approach to evaluate the quality of examination timetabling
El-Hussieny et al. Robotic exploration: new heuristic backtracking algorithm, performance evaluation and complexity metric
EP3961507A1 (en) Optimal policy learning and recommendation for distribution task using deep reinforcement learning model
CN112445621A (en) Static routing planning method and device, electronic equipment and storage medium
Ramkishore et al. Optimal bargaining mechanisms with refusal cost
Brázdil et al. Solvency Markov decision processes with interest
Kolomvatsos et al. Automatic fuzzy rules generation for the deadline calculation of a seller agent

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant