CN112270103B

CN112270103B - Cooperative strategy inversion identification method based on multi-agent game

Info

Publication number: CN112270103B
Application number: CN202011236015.XA
Authority: CN
Inventors: 俞成浦; 张振华; 李尧
Original assignee: Chongqing Innovation Center of Beijing University of Technology
Current assignee: Chongqing Innovation Center of Beijing University of Technology
Priority date: 2020-11-09
Filing date: 2020-11-09
Publication date: 2023-04-11
Anticipated expiration: 2040-11-09
Also published as: CN112270103A

Abstract

The invention discloses a cooperative strategy inversion identification method based on a multi-agent game, which aims at a multi-agent closed-loop dynamic game decision model in which multiple players participate, wherein each player detects and monitors a multi-unmanned-aerial-vehicle game system through a radar, acquires a mixed signal generated by the system and separates the signal, carries out inversion modeling according to the acquired system state information and the control input information of each agent, and identifies a cooperative game strategy characterization matrix of each agent in the game.

Description

Cooperative strategy inversion identification method based on multi-agent game

Technical Field

The invention relates to the technical field of system identification and parameters, in particular to a multi-agent game based cooperative strategy inversion identification method.

Background

Inverse problems (or inverse problems) are a broad problem on how to transform observations or measurements into their causal relationships, i.e. to determine parameters (or model parameters) characterizing a problem starting from observations and some general principles (or models). Some mechanism characterization parameters which cannot be directly observed can be obtained through an inversion identification technology, so that the method is widely applied to the fields of control, communication, medical treatment, geophysical and the like. The mechanism according to inversion can be divided into: linear inversion, generalized linear inversion, nonlinear inversion, iterative inversion, optimized inversion, and the like.

The cooperative strategy inversion identification method based on the multi-agent game has strong theoretical research significance and has important application value in military affairs. For example, in actual combat, because the environment of the intelligent agent is uncertain, the decision information is incomplete, and the interactive communication conditions are relatively limited, the cluster behavior of the multi-party intelligent agent combat unit may better conform to the implicit cooperative decision mode, i.e., imitate the cooperative mode of human beings, do not depend on direct interaction, and realize the implicit cooperative decision of the multi-agent through an implicit cooperative frame based on roles, i.e., approximate description can be realized through the non-cooperative dynamic game cooperative decision frame of the multi-agent system. If the cooperative game strategy can be inversely identified by observing the game results of the players participating in the game, the next action of the players of other parties can be predicted, and the response can be made in advance, so that the winning rate of the own party is improved.

For the dynamic game inversion problem, scholars at home and abroad have conducted some targeted researches: in a text of 'Inverse Optimal Control for Identification in Non-Cooperative Differential Games' published in 2017 by Simon Rothfu beta et al, taking a driving auxiliary system as an example, in a man-machine Cooperative background and under the condition of a known auxiliary system Cooperative game strategy, performing inversion modeling on a Cooperative game strategy of a person through dynamic game inversion Identification; florian

In a text of Inverse relationship Learning for Identification in Linear-Quadratic Dynamic Games published in 2017, for a two-player discrete closed-loop game system, the coupled algebraic ricacarti equation is used as optimization constraint, and an Inverse Identification method is designed for a ball-lever actual model on the premise of a known player cooperative game strategy; two finite time Open-Loop nonlinear Differential game inversion algorithms based on the minimum value principle are proposed in a text of Inverse Open-Loop nonlinear Differential Games and Inverse Optimal Control published in 2019 by Timothy L.Molloy et al, and higher identification precision is realized in two intelligent three-dimensional collision avoidance game examples.

At present, the existing optimization inversion modeling method applied to the cooperative strategy inversion identification of the multi-agent game system also has the following problems:

1. the method is difficult to be applied to a closed-loop dynamic game system decision model in which multiple players participate;

2. the existing method needs to completely know the cooperative game strategy of one player, and increases the suitability of an inversion optimization model;

3. the existing method has insufficient inversion identification precision under the noise-free condition and poor inversion identification robustness under the noise interference condition.

Disclosure of Invention

In order to solve the problems, the invention provides a cooperative strategy inversion identification method based on multi-agent game, which realizes a decision model of a closed-loop dynamic game system for participation of multiple players, performs inversion modeling by acquiring motion states and control input information of the multi-unmanned-plane game system as input, and then obtains a cooperative game strategy characterization matrix Q through an inversion identification method _i And R _ij Finally by using the identified weight matrix Q _i And R _ij And solving the Nash equilibrium solution of the forward game problem again to verify the effectiveness of the algorithm.

The invention provides a cooperative strategy inversion identification method based on multi-agent game, which has the following specific technical scheme:

s1: acquiring system state information with noise and control input information of each player;

the method comprises the following steps of collecting mixed signals generated by the multi-agent game system, separating the collected mixed signals, and further obtaining system state information and control input information of each agent through calculation, wherein the general forms of the system state information and the control input information of each agent are as follows:

x(kT)＝x ^* (kT)+v(kT)，k＝1：M

where k denotes the k-th observation point, T denotes the observation period, M denotes a total of M observation points in a certain period, x (kT) and u _i (kT) represents observed system motion states and ith agent control inputs, v (kT) and w, respectively _i (kT) represents the observed noise at the corresponding time.

S2: identifying an optimal feedback matrix for each player;

according to the obtained system state information and each intelligent agentIdentifying an estimate of the optimal feedback matrix for each agent in the game

S3: constructing a double-layer optimization model;

acquiring a dynamic equation of the multi-agent game system, and establishing a double-layer optimization model according to an optimal game strategy equation and an inverse identification problem of each player optimal feedback matrix obtained by identification on a cooperative game strategy, wherein the double-layer optimization model is as follows:

wherein i, j is the { 1., N }, which represents the number of players,

representing a positive definite matrix, Q _i And R _ij The optimal estimation of (2) is the inverse identification of the cooperative strategy of the multi-agent game system.

S4: solving the constructed optimization model to obtain an inversion identification result;

and converting the double-layer optimization model into a quadratic programming problem to solve.

S5: carrying out accuracy verification on the inversion identification result;

and verifying the accuracy of the result by calculating the relative error between the real value and the predicted value of the system state.

Further, in step S01, the separation of the acquired mixed signal includes the following steps:

A. under the condition of sufficient prior knowledge, designing each intelligent agent signal data separator of each player by a maximum posterior probability method or a principal component analysis method to realize signal separation;

B. under the condition of insufficient prior knowledge, multi-target data are separated through an ICA-based blind signal separation algorithm.

Further, in step S02, the step of acquiring the system state information with noise and the control input information of each player includes the steps of:

s01: obtaining the cooperative game strategy of each player according to the objective function of each player;

firstly, each player finds out the cooperative game strategy through an objective function shown by a minimization formula, wherein the objective function is as follows:

x(t ₀ )＝x ₀

wherein: i, j ∈ {1,..., N }, representing the number of players, and a collaborative characterization matrix Q _i And R _ij Representing the cooperative game strategy adopted by the ith player in the game;

s02: obtaining the optimal system state under the Nash balance of the system and the solution of each player control input, and generating a corresponding observed value interfered by noise;

and (3) executing the cooperative control strategy of the self game by solving the solution of the coupled algebra Riccati equation, wherein the calculation formula is as follows:

wherein:

further, in step S2, according to the system state information with noise and the control input information of each agent, an estimated value of the optimal feedback matrix is obtained by a least square identification method, and a calculation formula is as follows:

wherein

As an estimate of the optimal feedback matrix, K _i Is the feedback matrix of the ith player.

Feedback matrix K of the i-th player _i The calculation formula is as follows:

u _i (t)＝-K _i x(t)

the described

Further, in step S3, the obtaining of the dynamic equation of the multi-agent gaming system includes the following steps:

A. if the prior information of the intelligent agent of each player participating in the game exists, the model of the intelligent agent is distinguished through an observation means, and a system dynamics equation is obtained;

B. if the prior information of the player agents of all the parties participating in the game does not exist, the dynamic equations and the control inputs of the multiple agents participating in the game are identified through a blind system identification method of the multi-input multi-output system, and the dynamic equations of the system are obtained.

The dynamic equation is as follows:

u _i (t) and B _ii The representation is as follows:

wherein i belongs to { 1., N }, and N represents the number of players participating in the game; x (t) epsilon R ⁿ Representing the state quantity of the whole system; x is the number of _i (t) status information representing the ith player multi-agent; u. u _i (t) control input information representing an ith player multi-agent;

control input information of the mth agent representing the ith player, B _ii Represents XX for a diagonal matrix.

Further, in step S4, the specific process of the model conversion solution is as follows:

solving the inner layer coupling algebra Riccati equation according to the coupling algebra Riccati equation and the optimal feedback matrix

The relationship between them transforms the model into a quadratic programming problem equivalently, as follows:

s.t.Q _i ＞0

R _ij ＞0

/>

wherein G is ^T Is as I _n Is to be

Is as I _p Is D _i Is as follows.

Further, in step S5, the error is calculated as follows:

e _max ＝max(e ₁ ，e ₂ ，...，e _n )

wherein

J-th component, e, representing the estimated state quantity at time kT _max Indicating the relative error level of the algorithm.

The invention has the following beneficial effects:

1. aiming at a multi-agent game system with more than two players participating, a multi-agent game based cooperative strategy inversion optimization model is established through system states acquired within a certain time and control input observed values of the players, and the cooperative game strategy of each agent is obtained through solving the model.

2. An algorithm for equivalently converting the complex nonlinear constraint double-layer optimization problem into the quadratic programming problem easy to calculate is designed, and the method has high identification precision under the noise-free condition and has certain robustness under the noise interference condition.

Drawings

FIG. 1 is a block diagram of the architecture of the multi-agent system closed loop dynamic gaming of the present invention;

FIG. 2 is a schematic diagram of an application scenario of the present invention;

FIG. 3 is a diagram of a system state prediction relative error distribution under a noise-free condition according to the present invention;

FIG. 4 is a system state prediction histogram under noise-free conditions of the present invention;

FIG. 5 is a diagram of the system state prediction relative error distribution under white Gaussian noise of 30dB in accordance with the present invention;

FIG. 6 is a system state prediction histogram in white Gaussian noise of 30dB in accordance with the present invention;

Detailed Description

In the following description, technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The technical contents of the invention are described in detail below with reference to the accompanying drawings and specific embodiments.

The embodiment of the invention provides a cooperative strategy inversion identification method based on a multi-agent game, and aims at a multi-agent closed-loop dynamic game system formed by participation of multiple players, as shown in figure 1, in an unmanned aerial vehicle group game scene with participation of the multiple players, no cooperative relationship exists among the momentum of each party, each player detects and monitors a multi-unmanned aerial vehicle game system through a radar, acquires mixed signals generated by the system, and acquires the system state, the control input of each unmanned aerial vehicle and a dynamic equation of the multi-agent game system through a blind system identification method for separating the mixed signals from the input multi-output system.

In the game process, all players can make decisions which are most beneficial to themselves, and finally the whole system enters a Nash equilibrium state; the unmanned aerial vehicle group system in which the multiple players participate can be approximated as a linear quadratic non-cooperative closed loop dynamic game system; obtaining optimal system state quantity and optimal control input of each player by solving Nash equilibrium solution of the system, wherein the cooperative game strategy of each player passes through a weight matrix Q _i And R _ij To characterize.

The specific technical scheme of the embodiment of the invention is as follows:

s1: acquiring the optimal system state under Nash balance and the control input information of each player;

each player acquires a mixed signal generated by the system through detecting and monitoring a plurality of unmanned plane game systems by a radar, and separates the mixed signal to obtain system state information and control input information of each intelligent agent; in this embodiment, an ICA-based blind signal separation algorithm is used to separate the multi-target data signals, so as to obtain the following system states and control input information of each agent:

x(kT)＝x ^* (kT)+v(kT)，k＝1：M

According to the obtained system state information and the control input information of each intelligent agent, modeling is carried out on the multi-unmanned aerial vehicle closed-loop dynamic game control system, and a dynamic equation of the system is constructed, in the embodiment, three parties participate in the modeling of the unmanned aerial vehicle group, and the dynamic equation of the system is as follows:

x(t ₀ )＝x ₀

the coefficient matrix of the randomly generated stable multi-unmanned aerial vehicle system is as follows:

the initial value of the system state is x ₀ ＝[1，1，1，1] ^T

As shown in fig. 2, in the structural block diagram of the multi-agent closed-loop dynamic gaming problem in which N players participate, the control input of each player in nash balance at each moment depends not only on the state of the own player, but also receives the influence of the control input of other players in the gaming system, and the cooperative gaming policy of each player is obtained by minimizing the objective function shown in the formula, where the objective function of the cooperative gaming policy of each player is:

where i, j is an element {1, N }, where N represents the number of players, and the co-characterization matrix Q _i And R _ij Representing the cooperative game strategy adopted by the ith player in the game;

according to the cooperative game strategy objective functions of the players, the system state in the game and the optimal control input requirement of the player agent of the ith player meet the following equation, namely an optimal game strategy equation:

wherein x ^* (t) and

respectively inputting optimal system state and optimal control of the ith player agent; p _i Is the solution of the coupled algebraic ricarthat equation; />

An optimal feedback matrix representing the ith player.

Solution P of the coupled algebraic Riccati equation in this embodiment _i Adopting an iterative algorithm to solve, wherein the solving process is as follows:

①：

(2) the method comprises the following steps By

Available>

③：

(4) The method comprises the following steps If j is less than N, returning to the step (2);

(5) the method comprises the following steps If e is less than epsilon,

finishing;

(6) the method comprises the following steps k = k +1,j =1,e =0, and the procedure returns to step (2).

According to the obtained solution P of the coupled algebra Riccati equation _i And the system state and the equation which needs to be satisfied by the optimal control input of the ith player agent are obtained, and the optimal system state and the optimal control input quantity of each player agent under Nash balance are obtained.

A Gaussian white noise of 30dB is added to the optimal system state and player control inputs to obtain corresponding observations that are disturbed by noise.

S2: estimating value of optimal feedback matrix for identifying each player

Based on the noisy system state information x (kT) and the control input information u of each player _i (kT) uses a least squares identification method to identify the optimal counter of each player in a gameEstimation of a feed matrix

The calculation formula is as follows:

s3: establishing a double-layer optimization model;

establishing an optimization model according to the obtained dynamic equation of the system, the optimal game strategy equation and the inversion identification problem of the optimal feedback matrix of each player on the cooperative game strategy:

where i, j is in the { 1.,..,. N },

and the optimal estimation of the matrix in the optimization model representing the positive definite matrix pair represents the inversion identification of the cooperative strategy of the multi-agent game system.

S4: solving an optimization model;

(1) through the solution of the coupled algebra Riccati equation and the optimal feedback matrix estimation value

The relationship between the two equivalently converts the optimization model into a quadratic programming problem:

s.t.f _k (θ _i )≥0，k＝1，...，K

wherein

f _k (θ _i ) Represents Q _i ＞0，R _ij The nonlinear constraint term obtained by conversion is more than 0, and K represents the number of nonlinear constraint conditions obtained by conversion;

(2) and (3) introducing a logarithm barrier function to convert the nonlinear constraint optimization problem into an unconstrained optimization problem:

(3) obtaining a weight matrix Q of a cooperative game strategy objective function of each player by solving the unconstrained optimization problem through Newton _i And R _ij 。

S5: weight matrix Q obtained by verifying inversion identification _i And R _ij The accuracy of (2);

identifying the weight matrix Q obtained by inversion _i And R _ij Substituting the linear quadratic closed-loop dynamic game problem into the step 2 to solve the problem again to obtain the optimal system state under Nash balance, and then verifying the accuracy of the inversion identification algorithm through the relative error level calculated by the following formula, wherein the formula is as follows:

e _max ＝max(e ₁ ，e ₂ ，...，e _n )

wherein

J-th component, e, representing the estimated state quantity at time kT _max Indicating the relative error level of the algorithm. In the present embodiment, the accuracy under the noise-free condition and the robustness under the white Gaussian noise disturbance of 30dB are respectively verifiedThe bar property;

(1) and (3) verifying the accuracy of the inversion identification method under the noise-free condition: this example contains 100 randomly generated sets of Q _i And R _ij The obtained system state prediction relative error distribution graph and histogram show that the system state quantity relative estimation error is less than 10 under the noise-free condition as shown in fig. 3 and 4 ^-9 Namely, the provided non-cooperative game cooperative strategy inversion identification method of the multi-agent system has accuracy under the noise-free condition.

(2) And (3) robustness verification of the inversion identification method under 30dB white Gaussian noise: this example contains 100 randomly generated sets of Q _i And R _ij And (3) performing numerical verification, wherein the obtained system state prediction relative error distribution graph and histogram are shown in fig. 5 and 6, and the probability that the relative estimation error of the system state quantity is less than 0.1 reaches 90% under the interference of 30dB Gaussian white noise, namely the provided non-cooperative game cooperative strategy inversion identification method of the multi-agent system has certain robustness under the noise interference.

The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims

1. A cooperative strategy inversion identification method based on multi-agent game is provided, which is a multi-agent closed loop dynamic game decision model for multi-player participation, and is characterized by comprising the following steps:

the method comprises the following steps of collecting mixed signals generated by a multi-agent game system, separating the collected mixed signals, and further obtaining system state information and control input information of each agent through calculation, wherein the system state information and the control input information of each agent are as follows:

x(kT)＝x ^* (kT)+v(kT),k＝1:M

where k denotes the k-th observation point, T denotes the observation period, M denotes a total of M observation points in a certain period, x (kT) and u _i (kT) represents observed system motion states and ith agent control inputs, v (kT) and w, respectively _i (kT) represents the observation noise at the corresponding time, respectively;

s2: identifying an optimal feedback matrix for each player;

according to the obtained system state information and the control input information of each agent, the estimation value of the optimal feedback matrix of each agent in the game is identified

S3: constructing a double-layer optimization model;

where i, j e {1, \8230;, N }, represents the number of players,

representing a positive definite matrix, Q _i And R _ij The optimal estimation of the method is the inversion identification of the cooperative strategy of the multi-agent game system.

S5: carrying out accuracy verification on the inversion identification result;

2. The multi-agent game-based cooperative strategy inversion identification method according to claim 1, wherein the separation of the collected mixed signals in step S1 comprises the following steps:

3. The multi-agent gaming-based cooperative strategy inversion identification method of claim 1, wherein the step S2 of obtaining noisy system state information and control input information of each player comprises the steps of:

firstly, each player finds the cooperative game strategy through an objective function shown in a minimization formula, wherein the objective function is as follows:

x(t ₀ )＝x ₀

wherein: i, j e {1, \8230;, N }, representing the number of players, a co-characterization matrix Q _i And R _ij Representing the cooperative game strategy adopted by the ith player in the game;

and executing the cooperative control strategy of the own game by solving the solution of the coupled algebra Riccati equation, wherein the calculation formula is as follows:

wherein:

4. the cooperative strategy inversion identification method based on multi-agent gaming as claimed in claim 1, wherein in step S2, the estimated value of the optimal feedback matrix is obtained by a least square identification method according to the noisy system state information and the control input information of each agent, and the calculation formula is as follows:

wherein

As an estimate of the optimal feedback matrix, K _i A feedback matrix for the ith player, k representing the k-th observation point, T representing the observation period, M representing a total of M observation points in a certain period, x (kT) and u _i (kT) represents the observed system motion state and the ith agent control input, v (kT) and w, respectively _i (kT) represents the observation noise at the corresponding time.

5. The cooperative strategy inversion identification method based on multi-agent gaming as claimed in claim 1, wherein in step S3, the obtaining of the dynamic equation of the multi-agent gaming system comprises the following steps:

B. if the prior information of the player agents of all the parties participating in the game does not exist, identifying the dynamic equations and the control inputs of the multiple agents participating in the game by a blind system identification method of a multi-input multi-output system to obtain the dynamic equations of the system;

the dynamic equation is as follows:

wherein i belongs to {1, \8230;, N }, and N represents the number of players participating in the game; x (t) epsilon R ⁿ Representing the state quantity of the whole system; x is the number of _i (t) status information representing the ith player multi-agent; u. of _i (t) control input information representing an ith player multi-agent;

control input information of the mth agent representing the ith player, B _i Is a diagonal matrix.

6. The multi-agent game-based cooperative strategy inversion identification method according to claim 1, wherein in step S4, the model is solved by the following transformation process:

solving the inner layer coupling algebra Riccati equation according to the solution of the coupling algebra Riccati equation and the optimal feedback matrix

s.t.Q _i >0

R _ij >0

wherein, I _n Is a unit array of n × n, I _p Is a unit array of p × p.

7. The multi-agent game-based cooperative strategy inversion identification method according to claim 1, wherein in step S5, the error is calculated by the following method:

e _max ＝max(e ₁ ,e ₂ ,…,e _n )

wherein

J-th component, e, representing the estimated state quantity at time kT _max And the relative error level of the algorithm is represented, k represents the k-th observation point, T represents the observation period, and M represents a total of M observation points in a certain period. />