CN112270103B - Cooperative strategy inversion identification method based on multi-agent game - Google Patents

Cooperative strategy inversion identification method based on multi-agent game Download PDF

Info

Publication number
CN112270103B
CN112270103B CN202011236015.XA CN202011236015A CN112270103B CN 112270103 B CN112270103 B CN 112270103B CN 202011236015 A CN202011236015 A CN 202011236015A CN 112270103 B CN112270103 B CN 112270103B
Authority
CN
China
Prior art keywords
agent
game
player
cooperative
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011236015.XA
Other languages
Chinese (zh)
Other versions
CN112270103A (en
Inventor
俞成浦
张振华
李尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Innovation Center of Beijing University of Technology
Original Assignee
Chongqing Innovation Center of Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Innovation Center of Beijing University of Technology filed Critical Chongqing Innovation Center of Beijing University of Technology
Priority to CN202011236015.XA priority Critical patent/CN112270103B/en
Publication of CN112270103A publication Critical patent/CN112270103A/en
Application granted granted Critical
Publication of CN112270103B publication Critical patent/CN112270103B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/10Noise analysis or noise optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a cooperative strategy inversion identification method based on a multi-agent game, which aims at a multi-agent closed-loop dynamic game decision model in which multiple players participate, wherein each player detects and monitors a multi-unmanned-aerial-vehicle game system through a radar, acquires a mixed signal generated by the system and separates the signal, carries out inversion modeling according to the acquired system state information and the control input information of each agent, and identifies a cooperative game strategy characterization matrix of each agent in the game.

Description

Cooperative strategy inversion identification method based on multi-agent game
Technical Field
The invention relates to the technical field of system identification and parameters, in particular to a multi-agent game based cooperative strategy inversion identification method.
Background
Inverse problems (or inverse problems) are a broad problem on how to transform observations or measurements into their causal relationships, i.e. to determine parameters (or model parameters) characterizing a problem starting from observations and some general principles (or models). Some mechanism characterization parameters which cannot be directly observed can be obtained through an inversion identification technology, so that the method is widely applied to the fields of control, communication, medical treatment, geophysical and the like. The mechanism according to inversion can be divided into: linear inversion, generalized linear inversion, nonlinear inversion, iterative inversion, optimized inversion, and the like.
The cooperative strategy inversion identification method based on the multi-agent game has strong theoretical research significance and has important application value in military affairs. For example, in actual combat, because the environment of the intelligent agent is uncertain, the decision information is incomplete, and the interactive communication conditions are relatively limited, the cluster behavior of the multi-party intelligent agent combat unit may better conform to the implicit cooperative decision mode, i.e., imitate the cooperative mode of human beings, do not depend on direct interaction, and realize the implicit cooperative decision of the multi-agent through an implicit cooperative frame based on roles, i.e., approximate description can be realized through the non-cooperative dynamic game cooperative decision frame of the multi-agent system. If the cooperative game strategy can be inversely identified by observing the game results of the players participating in the game, the next action of the players of other parties can be predicted, and the response can be made in advance, so that the winning rate of the own party is improved.
For the dynamic game inversion problem, scholars at home and abroad have conducted some targeted researches: in a text of 'Inverse Optimal Control for Identification in Non-Cooperative Differential Games' published in 2017 by Simon Rothfu beta et al, taking a driving auxiliary system as an example, in a man-machine Cooperative background and under the condition of a known auxiliary system Cooperative game strategy, performing inversion modeling on a Cooperative game strategy of a person through dynamic game inversion Identification; florian
Figure GDA0004019558740000011
In a text of Inverse relationship Learning for Identification in Linear-Quadratic Dynamic Games published in 2017, for a two-player discrete closed-loop game system, the coupled algebraic ricacarti equation is used as optimization constraint, and an Inverse Identification method is designed for a ball-lever actual model on the premise of a known player cooperative game strategy; two finite time Open-Loop nonlinear Differential game inversion algorithms based on the minimum value principle are proposed in a text of Inverse Open-Loop nonlinear Differential Games and Inverse Optimal Control published in 2019 by Timothy L.Molloy et al, and higher identification precision is realized in two intelligent three-dimensional collision avoidance game examples.
At present, the existing optimization inversion modeling method applied to the cooperative strategy inversion identification of the multi-agent game system also has the following problems:
1. the method is difficult to be applied to a closed-loop dynamic game system decision model in which multiple players participate;
2. the existing method needs to completely know the cooperative game strategy of one player, and increases the suitability of an inversion optimization model;
3. the existing method has insufficient inversion identification precision under the noise-free condition and poor inversion identification robustness under the noise interference condition.
Disclosure of Invention
In order to solve the problems, the invention provides a cooperative strategy inversion identification method based on multi-agent game, which realizes a decision model of a closed-loop dynamic game system for participation of multiple players, performs inversion modeling by acquiring motion states and control input information of the multi-unmanned-plane game system as input, and then obtains a cooperative game strategy characterization matrix Q through an inversion identification method i And R ij Finally by using the identified weight matrix Q i And R ij And solving the Nash equilibrium solution of the forward game problem again to verify the effectiveness of the algorithm.
The invention provides a cooperative strategy inversion identification method based on multi-agent game, which has the following specific technical scheme:
s1: acquiring system state information with noise and control input information of each player;
the method comprises the following steps of collecting mixed signals generated by the multi-agent game system, separating the collected mixed signals, and further obtaining system state information and control input information of each agent through calculation, wherein the general forms of the system state information and the control input information of each agent are as follows:
x(kT)=x * (kT)+v(kT),k=1:M
Figure GDA0004019558740000021
where k denotes the k-th observation point, T denotes the observation period, M denotes a total of M observation points in a certain period, x (kT) and u i (kT) represents observed system motion states and ith agent control inputs, v (kT) and w, respectively i (kT) represents the observed noise at the corresponding time.
S2: identifying an optimal feedback matrix for each player;
according to the obtained system state information and each intelligent agentIdentifying an estimate of the optimal feedback matrix for each agent in the game
Figure GDA0004019558740000022
S3: constructing a double-layer optimization model;
acquiring a dynamic equation of the multi-agent game system, and establishing a double-layer optimization model according to an optimal game strategy equation and an inverse identification problem of each player optimal feedback matrix obtained by identification on a cooperative game strategy, wherein the double-layer optimization model is as follows:
Figure GDA0004019558740000023
Figure GDA0004019558740000031
Figure GDA0004019558740000032
wherein i, j is the { 1., N }, which represents the number of players,
Figure GDA0004019558740000033
Figure GDA0004019558740000034
representing a positive definite matrix, Q i And R ij The optimal estimation of (2) is the inverse identification of the cooperative strategy of the multi-agent game system.
S4: solving the constructed optimization model to obtain an inversion identification result;
and converting the double-layer optimization model into a quadratic programming problem to solve.
S5: carrying out accuracy verification on the inversion identification result;
and verifying the accuracy of the result by calculating the relative error between the real value and the predicted value of the system state.
Further, in step S01, the separation of the acquired mixed signal includes the following steps:
A. under the condition of sufficient prior knowledge, designing each intelligent agent signal data separator of each player by a maximum posterior probability method or a principal component analysis method to realize signal separation;
B. under the condition of insufficient prior knowledge, multi-target data are separated through an ICA-based blind signal separation algorithm.
Further, in step S02, the step of acquiring the system state information with noise and the control input information of each player includes the steps of:
s01: obtaining the cooperative game strategy of each player according to the objective function of each player;
firstly, each player finds out the cooperative game strategy through an objective function shown by a minimization formula, wherein the objective function is as follows:
Figure GDA0004019558740000035
x(t 0 )=x 0
wherein: i, j ∈ {1,..., N }, representing the number of players, and a collaborative characterization matrix Q i And R ij Representing the cooperative game strategy adopted by the ith player in the game;
s02: obtaining the optimal system state under the Nash balance of the system and the solution of each player control input, and generating a corresponding observed value interfered by noise;
and (3) executing the cooperative control strategy of the self game by solving the solution of the coupled algebra Riccati equation, wherein the calculation formula is as follows:
Figure GDA0004019558740000041
wherein:
Figure GDA0004019558740000042
further, in step S2, according to the system state information with noise and the control input information of each agent, an estimated value of the optimal feedback matrix is obtained by a least square identification method, and a calculation formula is as follows:
Figure GDA0004019558740000043
wherein
Figure GDA0004019558740000044
As an estimate of the optimal feedback matrix, K i Is the feedback matrix of the ith player.
Feedback matrix K of the i-th player i The calculation formula is as follows:
u i (t)=-K i x(t)
the described
Further, in step S3, the obtaining of the dynamic equation of the multi-agent gaming system includes the following steps:
A. if the prior information of the intelligent agent of each player participating in the game exists, the model of the intelligent agent is distinguished through an observation means, and a system dynamics equation is obtained;
B. if the prior information of the player agents of all the parties participating in the game does not exist, the dynamic equations and the control inputs of the multiple agents participating in the game are identified through a blind system identification method of the multi-input multi-output system, and the dynamic equations of the system are obtained.
The dynamic equation is as follows:
Figure GDA0004019558740000045
u i (t) and B ii The representation is as follows:
Figure GDA0004019558740000051
Figure GDA0004019558740000052
wherein i belongs to { 1., N }, and N represents the number of players participating in the game; x (t) epsilon R n Representing the state quantity of the whole system; x is the number of i (t) status information representing the ith player multi-agent; u. u i (t) control input information representing an ith player multi-agent;
Figure GDA0004019558740000053
control input information of the mth agent representing the ith player, B ii Represents XX for a diagonal matrix.
Further, in step S4, the specific process of the model conversion solution is as follows:
solving the inner layer coupling algebra Riccati equation according to the coupling algebra Riccati equation and the optimal feedback matrix
Figure GDA0004019558740000054
The relationship between them transforms the model into a quadratic programming problem equivalently, as follows:
Figure GDA0004019558740000055
s.t.Q i >0
R ij >0
Figure GDA0004019558740000056
Figure GDA0004019558740000057
/>
wherein G is T Is as I n Is to be
Figure GDA0004019558740000058
Is as I p Is D i Is as follows.
Further, in step S5, the error is calculated as follows:
Figure GDA0004019558740000059
e max =max(e 1 ,e 2 ,...,e n )
wherein
Figure GDA00040195587400000510
J-th component, e, representing the estimated state quantity at time kT max Indicating the relative error level of the algorithm.
The invention has the following beneficial effects:
1. aiming at a multi-agent game system with more than two players participating, a multi-agent game based cooperative strategy inversion optimization model is established through system states acquired within a certain time and control input observed values of the players, and the cooperative game strategy of each agent is obtained through solving the model.
2. An algorithm for equivalently converting the complex nonlinear constraint double-layer optimization problem into the quadratic programming problem easy to calculate is designed, and the method has high identification precision under the noise-free condition and has certain robustness under the noise interference condition.
Drawings
FIG. 1 is a block diagram of the architecture of the multi-agent system closed loop dynamic gaming of the present invention;
FIG. 2 is a schematic diagram of an application scenario of the present invention;
FIG. 3 is a diagram of a system state prediction relative error distribution under a noise-free condition according to the present invention;
FIG. 4 is a system state prediction histogram under noise-free conditions of the present invention;
FIG. 5 is a diagram of the system state prediction relative error distribution under white Gaussian noise of 30dB in accordance with the present invention;
FIG. 6 is a system state prediction histogram in white Gaussian noise of 30dB in accordance with the present invention;
Detailed Description
In the following description, technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The technical contents of the invention are described in detail below with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides a cooperative strategy inversion identification method based on a multi-agent game, and aims at a multi-agent closed-loop dynamic game system formed by participation of multiple players, as shown in figure 1, in an unmanned aerial vehicle group game scene with participation of the multiple players, no cooperative relationship exists among the momentum of each party, each player detects and monitors a multi-unmanned aerial vehicle game system through a radar, acquires mixed signals generated by the system, and acquires the system state, the control input of each unmanned aerial vehicle and a dynamic equation of the multi-agent game system through a blind system identification method for separating the mixed signals from the input multi-output system.
In the game process, all players can make decisions which are most beneficial to themselves, and finally the whole system enters a Nash equilibrium state; the unmanned aerial vehicle group system in which the multiple players participate can be approximated as a linear quadratic non-cooperative closed loop dynamic game system; obtaining optimal system state quantity and optimal control input of each player by solving Nash equilibrium solution of the system, wherein the cooperative game strategy of each player passes through a weight matrix Q i And R ij To characterize.
The specific technical scheme of the embodiment of the invention is as follows:
s1: acquiring the optimal system state under Nash balance and the control input information of each player;
each player acquires a mixed signal generated by the system through detecting and monitoring a plurality of unmanned plane game systems by a radar, and separates the mixed signal to obtain system state information and control input information of each intelligent agent; in this embodiment, an ICA-based blind signal separation algorithm is used to separate the multi-target data signals, so as to obtain the following system states and control input information of each agent:
x(kT)=x * (kT)+v(kT),k=1:M
Figure GDA0004019558740000071
where k denotes the k-th observation point, T denotes the observation period, M denotes a total of M observation points in a certain period, x (kT) and u i (kT) represents observed system motion states and ith agent control inputs, v (kT) and w, respectively i (kT) represents the observed noise at the corresponding time.
According to the obtained system state information and the control input information of each intelligent agent, modeling is carried out on the multi-unmanned aerial vehicle closed-loop dynamic game control system, and a dynamic equation of the system is constructed, in the embodiment, three parties participate in the modeling of the unmanned aerial vehicle group, and the dynamic equation of the system is as follows:
Figure GDA0004019558740000072
x(t 0 )=x 0
the coefficient matrix of the randomly generated stable multi-unmanned aerial vehicle system is as follows:
Figure GDA0004019558740000073
Figure GDA0004019558740000074
the initial value of the system state is x 0 =[1,1,1,1] T
As shown in fig. 2, in the structural block diagram of the multi-agent closed-loop dynamic gaming problem in which N players participate, the control input of each player in nash balance at each moment depends not only on the state of the own player, but also receives the influence of the control input of other players in the gaming system, and the cooperative gaming policy of each player is obtained by minimizing the objective function shown in the formula, where the objective function of the cooperative gaming policy of each player is:
Figure GDA0004019558740000075
where i, j is an element {1, N }, where N represents the number of players, and the co-characterization matrix Q i And R ij Representing the cooperative game strategy adopted by the ith player in the game;
according to the cooperative game strategy objective functions of the players, the system state in the game and the optimal control input requirement of the player agent of the ith player meet the following equation, namely an optimal game strategy equation:
Figure GDA0004019558740000081
Figure GDA0004019558740000082
Figure GDA0004019558740000083
wherein x * (t) and
Figure GDA0004019558740000084
respectively inputting optimal system state and optimal control of the ith player agent; p i Is the solution of the coupled algebraic ricarthat equation; />
Figure GDA0004019558740000085
An optimal feedback matrix representing the ith player.
Solution P of the coupled algebraic Riccati equation in this embodiment i Adopting an iterative algorithm to solve, wherein the solving process is as follows:
①:
Figure GDA0004019558740000086
(2) the method comprises the following steps By
Figure GDA0004019558740000087
Available>
Figure GDA0004019558740000088
③:
Figure GDA0004019558740000089
(4) The method comprises the following steps If j is less than N, returning to the step (2);
(5) the method comprises the following steps If e is less than epsilon,
Figure GDA00040195587400000810
finishing;
(6) the method comprises the following steps k = k +1,j =1,e =0, and the procedure returns to step (2).
According to the obtained solution P of the coupled algebra Riccati equation i And the system state and the equation which needs to be satisfied by the optimal control input of the ith player agent are obtained, and the optimal system state and the optimal control input quantity of each player agent under Nash balance are obtained.
A Gaussian white noise of 30dB is added to the optimal system state and player control inputs to obtain corresponding observations that are disturbed by noise.
S2: estimating value of optimal feedback matrix for identifying each player
Figure GDA00040195587400000811
Based on the noisy system state information x (kT) and the control input information u of each player i (kT) uses a least squares identification method to identify the optimal counter of each player in a gameEstimation of a feed matrix
Figure GDA00040195587400000812
The calculation formula is as follows:
Figure GDA00040195587400000813
s3: establishing a double-layer optimization model;
establishing an optimization model according to the obtained dynamic equation of the system, the optimal game strategy equation and the inversion identification problem of the optimal feedback matrix of each player on the cooperative game strategy:
Figure GDA0004019558740000091
Figure GDA0004019558740000092
where i, j is in the { 1.,..,. N },
Figure GDA0004019558740000093
Figure GDA0004019558740000094
and the optimal estimation of the matrix in the optimization model representing the positive definite matrix pair represents the inversion identification of the cooperative strategy of the multi-agent game system.
S4: solving an optimization model;
(1) through the solution of the coupled algebra Riccati equation and the optimal feedback matrix estimation value
Figure GDA0004019558740000095
The relationship between the two equivalently converts the optimization model into a quadratic programming problem:
Figure GDA0004019558740000096
s.t.f ki )≥0,k=1,...,K
wherein
Figure GDA0004019558740000097
f ki ) Represents Q i >0,R ij The nonlinear constraint term obtained by conversion is more than 0, and K represents the number of nonlinear constraint conditions obtained by conversion;
(2) and (3) introducing a logarithm barrier function to convert the nonlinear constraint optimization problem into an unconstrained optimization problem:
Figure GDA0004019558740000098
(3) obtaining a weight matrix Q of a cooperative game strategy objective function of each player by solving the unconstrained optimization problem through Newton i And R ij
S5: weight matrix Q obtained by verifying inversion identification i And R ij The accuracy of (2);
identifying the weight matrix Q obtained by inversion i And R ij Substituting the linear quadratic closed-loop dynamic game problem into the step 2 to solve the problem again to obtain the optimal system state under Nash balance, and then verifying the accuracy of the inversion identification algorithm through the relative error level calculated by the following formula, wherein the formula is as follows:
Figure GDA0004019558740000101
e max =max(e 1 ,e 2 ,...,e n )
wherein
Figure GDA0004019558740000102
J-th component, e, representing the estimated state quantity at time kT max Indicating the relative error level of the algorithm. In the present embodiment, the accuracy under the noise-free condition and the robustness under the white Gaussian noise disturbance of 30dB are respectively verifiedThe bar property;
(1) and (3) verifying the accuracy of the inversion identification method under the noise-free condition: this example contains 100 randomly generated sets of Q i And R ij The obtained system state prediction relative error distribution graph and histogram show that the system state quantity relative estimation error is less than 10 under the noise-free condition as shown in fig. 3 and 4 -9 Namely, the provided non-cooperative game cooperative strategy inversion identification method of the multi-agent system has accuracy under the noise-free condition.
(2) And (3) robustness verification of the inversion identification method under 30dB white Gaussian noise: this example contains 100 randomly generated sets of Q i And R ij And (3) performing numerical verification, wherein the obtained system state prediction relative error distribution graph and histogram are shown in fig. 5 and 6, and the probability that the relative estimation error of the system state quantity is less than 0.1 reaches 90% under the interference of 30dB Gaussian white noise, namely the provided non-cooperative game cooperative strategy inversion identification method of the multi-agent system has certain robustness under the noise interference.
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims (7)

1. A cooperative strategy inversion identification method based on multi-agent game is provided, which is a multi-agent closed loop dynamic game decision model for multi-player participation, and is characterized by comprising the following steps:
s1: acquiring system state information with noise and control input information of each player;
the method comprises the following steps of collecting mixed signals generated by a multi-agent game system, separating the collected mixed signals, and further obtaining system state information and control input information of each agent through calculation, wherein the system state information and the control input information of each agent are as follows:
x(kT)=x * (kT)+v(kT),k=1:M
Figure QLYQS_1
where k denotes the k-th observation point, T denotes the observation period, M denotes a total of M observation points in a certain period, x (kT) and u i (kT) represents observed system motion states and ith agent control inputs, v (kT) and w, respectively i (kT) represents the observation noise at the corresponding time, respectively;
s2: identifying an optimal feedback matrix for each player;
according to the obtained system state information and the control input information of each agent, the estimation value of the optimal feedback matrix of each agent in the game is identified
Figure QLYQS_2
S3: constructing a double-layer optimization model;
acquiring a dynamic equation of the multi-agent game system, and establishing a double-layer optimization model according to an optimal game strategy equation and an inverse identification problem of each player optimal feedback matrix obtained by identification on a cooperative game strategy, wherein the double-layer optimization model is as follows:
Figure QLYQS_3
Figure QLYQS_4
Figure QLYQS_5
where i, j e {1, \8230;, N }, represents the number of players,
Figure QLYQS_6
Figure QLYQS_7
representing a positive definite matrix, Q i And R ij The optimal estimation of the method is the inversion identification of the cooperative strategy of the multi-agent game system.
S4: solving the constructed optimization model to obtain an inversion identification result;
and converting the double-layer optimization model into a quadratic programming problem to solve.
S5: carrying out accuracy verification on the inversion identification result;
and verifying the accuracy of the result by calculating the relative error between the real value and the predicted value of the system state.
2. The multi-agent game-based cooperative strategy inversion identification method according to claim 1, wherein the separation of the collected mixed signals in step S1 comprises the following steps:
A. under the condition of sufficient prior knowledge, designing each intelligent agent signal data separator of each player by a maximum posterior probability method or a principal component analysis method to realize signal separation;
B. under the condition of insufficient prior knowledge, multi-target data are separated through an ICA-based blind signal separation algorithm.
3. The multi-agent gaming-based cooperative strategy inversion identification method of claim 1, wherein the step S2 of obtaining noisy system state information and control input information of each player comprises the steps of:
s01: obtaining the cooperative game strategy of each player according to the objective function of each player;
firstly, each player finds the cooperative game strategy through an objective function shown in a minimization formula, wherein the objective function is as follows:
Figure QLYQS_8
Figure QLYQS_9
x(t 0 )=x 0
wherein: i, j e {1, \8230;, N }, representing the number of players, a co-characterization matrix Q i And R ij Representing the cooperative game strategy adopted by the ith player in the game;
s02: obtaining the optimal system state under the Nash balance of the system and the solution of each player control input, and generating a corresponding observed value interfered by noise;
and executing the cooperative control strategy of the own game by solving the solution of the coupled algebra Riccati equation, wherein the calculation formula is as follows:
Figure QLYQS_10
wherein:
Figure QLYQS_11
4. the cooperative strategy inversion identification method based on multi-agent gaming as claimed in claim 1, wherein in step S2, the estimated value of the optimal feedback matrix is obtained by a least square identification method according to the noisy system state information and the control input information of each agent, and the calculation formula is as follows:
Figure QLYQS_12
wherein
Figure QLYQS_13
As an estimate of the optimal feedback matrix, K i A feedback matrix for the ith player, k representing the k-th observation point, T representing the observation period, M representing a total of M observation points in a certain period, x (kT) and u i (kT) represents the observed system motion state and the ith agent control input, v (kT) and w, respectively i (kT) represents the observation noise at the corresponding time.
5. The cooperative strategy inversion identification method based on multi-agent gaming as claimed in claim 1, wherein in step S3, the obtaining of the dynamic equation of the multi-agent gaming system comprises the following steps:
A. if the prior information of the intelligent agent of each player participating in the game exists, the model of the intelligent agent is distinguished through an observation means, and a system dynamics equation is obtained;
B. if the prior information of the player agents of all the parties participating in the game does not exist, identifying the dynamic equations and the control inputs of the multiple agents participating in the game by a blind system identification method of a multi-input multi-output system to obtain the dynamic equations of the system;
the dynamic equation is as follows:
Figure QLYQS_14
wherein i belongs to {1, \8230;, N }, and N represents the number of players participating in the game; x (t) epsilon R n Representing the state quantity of the whole system; x is the number of i (t) status information representing the ith player multi-agent; u. of i (t) control input information representing an ith player multi-agent;
Figure QLYQS_15
control input information of the mth agent representing the ith player, B i Is a diagonal matrix.
6. The multi-agent game-based cooperative strategy inversion identification method according to claim 1, wherein in step S4, the model is solved by the following transformation process:
solving the inner layer coupling algebra Riccati equation according to the solution of the coupling algebra Riccati equation and the optimal feedback matrix
Figure QLYQS_16
The relationship between them transforms the model into a quadratic programming problem equivalently, as follows:
Figure QLYQS_17
s.t.Q i >0
R ij >0
Figure QLYQS_18
Figure QLYQS_19
wherein, I n Is a unit array of n × n, I p Is a unit array of p × p.
7. The multi-agent game-based cooperative strategy inversion identification method according to claim 1, wherein in step S5, the error is calculated by the following method:
Figure QLYQS_20
e max =max(e 1 ,e 2 ,…,e n )
wherein
Figure QLYQS_21
J-th component, e, representing the estimated state quantity at time kT max And the relative error level of the algorithm is represented, k represents the k-th observation point, T represents the observation period, and M represents a total of M observation points in a certain period. />
CN202011236015.XA 2020-11-09 2020-11-09 Cooperative strategy inversion identification method based on multi-agent game Active CN112270103B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011236015.XA CN112270103B (en) 2020-11-09 2020-11-09 Cooperative strategy inversion identification method based on multi-agent game

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011236015.XA CN112270103B (en) 2020-11-09 2020-11-09 Cooperative strategy inversion identification method based on multi-agent game

Publications (2)

Publication Number Publication Date
CN112270103A CN112270103A (en) 2021-01-26
CN112270103B true CN112270103B (en) 2023-04-11

Family

ID=74339826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011236015.XA Active CN112270103B (en) 2020-11-09 2020-11-09 Cooperative strategy inversion identification method based on multi-agent game

Country Status (1)

Country Link
CN (1) CN112270103B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258219B (en) * 2020-01-19 2022-05-03 北京理工大学 Inversion identification method for multi-agent system cooperation strategy
CN113867418B (en) * 2021-09-17 2022-06-17 南京信息工程大学 Unmanned aerial vehicle cluster autonomous cooperative scout task scheduling method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258219A (en) * 2020-01-19 2020-06-09 北京理工大学 Inversion identification method for multi-agent system cooperation strategy

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9737988B2 (en) * 2014-10-31 2017-08-22 Intelligent Fusion Technology, Inc Methods and devices for demonstrating three-player pursuit-evasion game
CN107608366B (en) * 2017-09-01 2021-02-05 宁波大学 Multi-wing umbrella unmanned aerial vehicle system based on event trigger
CN108958032B (en) * 2018-07-24 2021-09-03 湖南工业大学 Total amount cooperative and consistent control method of nonlinear multi-agent system
CN111275174B (en) * 2020-02-13 2020-09-18 中国人民解放军32802部队 Game-oriented radar countermeasure generating method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111258219A (en) * 2020-01-19 2020-06-09 北京理工大学 Inversion identification method for multi-agent system cooperation strategy

Also Published As

Publication number Publication date
CN112270103A (en) 2021-01-26

Similar Documents

Publication Publication Date Title
CN112270103B (en) Cooperative strategy inversion identification method based on multi-agent game
CN112180967B (en) Multi-unmanned aerial vehicle cooperative countermeasure decision-making method based on evaluation-execution architecture
CN113435644B (en) Emergency prediction method based on deep bidirectional long-short term memory neural network
Shi et al. Lateral transfer learning for multiagent reinforcement learning
CN112198892B (en) Multi-unmanned aerial vehicle intelligent cooperative penetration countermeasure method
CN112180724A (en) Training method and system for multi-agent cooperative cooperation under interference condition
CN114358141A (en) Multi-agent reinforcement learning method oriented to multi-combat-unit cooperative decision
CN111753300B (en) Method and device for detecting and defending abnormal data for reinforcement learning
CN111258219B (en) Inversion identification method for multi-agent system cooperation strategy
CN116225049A (en) Multi-unmanned plane wolf-crowd collaborative combat attack and defense decision algorithm
CN115544714A (en) Time sequence dynamic countermeasure threat assessment method based on aircraft formation
CN106507275A (en) A kind of robust Distributed filtering method and apparatus of wireless sensor network
CN114679729A (en) Radar communication integrated unmanned aerial vehicle cooperative multi-target detection method
Friedrich et al. Neural optimal feedback control with local learning rules
CN113894780A (en) Multi-robot cooperative countermeasure method and device, electronic equipment and storage medium
Liu et al. Optimal DoS attack scheduling for multi-sensor remote state estimation over interference channels
Stella et al. Bio-inspired evolutionary game dynamics in symmetric and asymmetric models
CN117009811A (en) Multi-agent training method and system based on reinforcement learning
CN114866272B (en) Multi-round data delivery system of true value discovery algorithm in crowd-sourced sensing environment
CN116301042A (en) Unmanned aerial vehicle group autonomous control method based on VGG16 and virtual game
CN114757092A (en) System and method for training multi-agent cooperative communication strategy based on teammate perception
CN114662655A (en) Attention mechanism-based weapon and chess deduction AI hierarchical decision method and device
CN114170338A (en) Image generation method based on adaptive gradient clipping under differential privacy protection
CN113807230A (en) Equipment target identification method based on active reinforcement learning and man-machine intelligent body
CN112926746A (en) Decision-making method and device for multi-agent reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant