CN113778619B - Multi-agent state control method, device and terminal for multi-cluster game - Google Patents

Multi-agent state control method, device and terminal for multi-cluster game Download PDF

Info

Publication number
CN113778619B
CN113778619B CN202110923586.9A CN202110923586A CN113778619B CN 113778619 B CN113778619 B CN 113778619B CN 202110923586 A CN202110923586 A CN 202110923586A CN 113778619 B CN113778619 B CN 113778619B
Authority
CN
China
Prior art keywords
cluster
agent
intelligent
communication
neighbor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110923586.9A
Other languages
Chinese (zh)
Other versions
CN113778619A (en
Inventor
尉越
崔金强
丁玉隆
商成思
宋伟伟
孙涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202110923586.9A priority Critical patent/CN113778619B/en
Publication of CN113778619A publication Critical patent/CN113778619A/en
Application granted granted Critical
Publication of CN113778619B publication Critical patent/CN113778619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multi-agent state control method, a device and a terminal for multi-cluster game, which are characterized in that the state between each agent is controlled by determining the first communication parameters from each agent in each cluster to neighbor agents according to the first communication relation between each agent in each cluster of the agent system, determining the second communication parameters from each cluster to neighbor clusters according to the second communication relation between the leading agents of each cluster, and constructing an agent state control function according to preset inequality constraint and cost function, the first communication parameters and the second communication parameters, so that the whole intelligent system achieves Nash equilibrium.

Description

Multi-agent state control method, device and terminal for multi-cluster game
Technical Field
The invention relates to the technical field of multi-agent system control, in particular to a multi-agent state control method, device and terminal for multi-cluster game.
Background
The multi-agent system can be seen as an autonomous intelligent unmanned platform group, can complete a plurality of group tasks through interaction of information and actions, greatly improves the intelligent degree of the whole group through cooperative behavior among agents in the multi-agent system, can cope with complex tasks which cannot be completed by a single body, and is widely applied to the fields including sensor group deployment, multi-unmanned aerial vehicle formation control, multi-mechanical arm cooperative transportation and the like.
However, in the prior art, there is no method how to converge the state of an agent system composed of agent clusters to a game problem Nash equilibrium point when there is a game between agent clusters.
Accordingly, there is a need for improvement and advancement in the art.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a multi-agent state control method, a multi-agent state control device and a multi-agent state control terminal for multi-cluster game, and aims to solve the problem that in the prior art, a method for converging the state of an agent system formed by agent clusters to a game problem Nash equilibrium point when the game exists among the agent clusters is not yet available.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
In a first aspect of the present invention, a multi-agent state control method for multi-cluster gaming is provided, the method comprising:
acquiring a first communication relationship between the intelligent agents in each cluster of the intelligent agent system and a second communication relationship between the leading intelligent agents of each cluster;
Determining first communication parameters from each intelligent agent to neighbor intelligent agents in each cluster according to the first communication relation, and determining second communication parameters from each cluster to neighbor clusters according to the second communication relation;
constructing an intelligent body state control function according to preset inequality constraint, preset cost function, each first communication parameter and each second communication parameter;
And controlling the state of each agent at each moment according to the agent state control function so as to ensure that the agent system achieves Nash equilibrium.
The multi-agent state control method for multi-cluster game, wherein determining the first communication parameters from each agent to the neighbor agents in each cluster according to the first communication relation comprises the following steps:
Undirected communication between neighbor agents within each cluster, the neighbor agent set of agent i within cluster j is represented as Said first communication parameter of agent i to agent k is denoted/>Said first communication parameter of agent k to agent i is denoted/>
The multi-agent state control method for multi-cluster game, wherein the determining the second communication parameter from each cluster to the neighbor cluster according to the second communication relation comprises the following steps:
directed communication between the leading agent of a cluster and the leading agent of a neighbor cluster, the neighbor cluster set of cluster j being represented as The second communication parameter of cluster j to cluster l is denoted/>When the leading agent of cluster j can receive information from the leading agent of cluster l, then/>
The multi-agent state control method for the multi-cluster game comprises the following steps:
Wherein/> k=1,2,For the local cost function of agent i in cluster j,/>Is a convex function;
The state decision variables of agent i in cluster j are expressed as Where j ε {1, …, m }, i ε {1, …, n j }/>M represents the number of clusters in the intelligent system, n j represents the number of intelligent agents in the cluster j, n represents the number of intelligent agents in the intelligent system, and the stack form of state decision variables of all intelligent agents in the cluster j isThe state decision variables of all participating agents except the state decision variable x j are represented as Is part of x _j, defined as x _j information that is available to agent i in cluster j.
The multi-agent state control method for multi-cluster game comprises the steps of for all j epsilon {1, …, m } and i epsilon {1, …, n j }, cost functionPair/>, given x _j Is secondarily continuous and slightly and strongly convex,/>Pair/>, given x _j Is the lower semi-continuous closed convex function.
According to the multi-agent state control method for the multi-cluster game, inequality constraint of a cluster j is g j.
The multi-agent state control method for the multi-cluster game comprises the step of determining g j under the given x _j Is the lower semi-continuous closed convex function, and g j is K 2,j -Li Puxi Ks continuous, K 2,j is a constant, K 2,j > 0.
The multi-agent state control method for the multi-cluster game comprises the following steps:
Wherein the method comprises the steps of State decision vector representing leading agent of cluster j,/>Y j、νj、ηj、ωj、rj are auxiliary variables,/>And when i+.1,/>Alpha 1、α2、α3、α4 is a constant.
The multi-agent state control method for multi-cluster game, wherein the controlling the state of each agent according to the agent state control function comprises the following steps:
determining a state decision vector of each intelligent agent at each moment according to the intelligent agent state control function;
And determining the target state of each intelligent agent at each moment according to the state decision vector of each intelligent agent at each moment, and controlling each intelligent agent to reach the target state corresponding to each moment respectively at each moment.
In a second aspect of the present invention, there is provided a multi-agent state control device for multi-cluster gaming, the device comprising:
the communication relation acquisition module is used for acquiring a first communication relation between the intelligent agents in each cluster of the intelligent agent system and a second communication relation between the leading intelligent agents of each cluster;
The communication parameter determining module is used for determining first communication parameters from each intelligent agent to a neighbor intelligent agent in each cluster according to the first communication relation and determining second communication parameters from each cluster to the neighbor cluster according to the second communication relation;
The function construction module is used for constructing an intelligent body state control function according to preset inequality constraint, preset cost function, each first communication parameter and each second communication parameter;
And the state control module is used for controlling the state of each intelligent agent according to the intelligent agent state control function so as to ensure that the intelligent agent system achieves Nash equilibrium.
The multi-agent state control device for the multi-cluster game comprises the following intelligent agent state control functions:
wherein the state decision variables of agent i in cluster j are represented as Where j ε {1, …, m }, i ε {1, …, n j }/>M represents the number of clusters in the intelligent system, n j represents the number of intelligent agents in the cluster j, n represents the number of intelligent agents in the intelligent system, and the stack form of state decision variables of all intelligent agents in the cluster j isThe state decision variables of all participating agents except the state decision variable x j are represented as As part of x _j, defined as the information of x _j that is receivable at agent i of cluster j, the inequality constraint of cluster j is g j,/>State decision vector representing leading agent of cluster j,/>Y j、νj、ηj、ωj are auxiliary variables,/> And when i+.1,/>Alpha 1、α2、α3、α4 is a constant and the neighbor set of agent i within cluster j is denoted/>Said first communication parameter of agent i to agent k is denoted/>The neighbor cluster set of cluster j is denoted/>The second communication parameter of cluster j to cluster l is denoted/>Is the local cost function of agent i in cluster j.
In a third aspect of the present invention, there is provided a terminal comprising a processor, a storage medium in communication with the processor, the storage medium adapted to store a plurality of instructions, the processor adapted to invoke the instructions in the storage medium to perform the steps of implementing the multi-agent state control method of multi-cluster gaming as described in any of the preceding claims.
In a fourth aspect of the present invention, there is provided a storage medium storing one or more programs executable by one or more processors to implement the steps of the multi-agent state control method for multi-cluster gaming described in any of the above.
Compared with the prior art, the invention provides a multi-agent state control method, a terminal and a storage medium for multi-cluster game, wherein the multi-agent state control method for multi-cluster game ensures that the state between each agent is controlled by determining the first communication parameters from each agent in the cluster to the neighbor agents according to the first communication relation between each agent in each cluster of the agent system, determining the second communication parameters from each cluster to the neighbor clusters according to the second communication relation between the leading agents of each cluster, and constructing the state control function of each agent according to the preset inequality constraint and cost function, the first communication parameters and the second communication parameters.
Drawings
FIG. 1 is a flowchart of an embodiment of a multi-agent state control method for multi-cluster gaming provided by the present invention;
FIG. 2 is a schematic diagram of a communication relationship of a multi-agent system in an embodiment of a multi-agent state control method for multi-cluster gaming provided by the present invention;
FIG. 3 is a schematic diagram showing a result of validity verification of a multi-agent state control method for multi-cluster gaming according to the present invention;
FIG. 4 is a second schematic diagram of the validity verification result of the multi-agent state control method for multi-cluster gaming provided by the present invention;
FIG. 5 is a third schematic diagram of the validity verification result of the multi-agent state control method for multi-cluster gaming provided by the present invention;
FIG. 6 is a schematic diagram showing a result of validity verification of a multi-agent state control method for multi-cluster gaming according to the present invention;
FIG. 7 is a schematic diagram showing a result of validity verification of a multi-agent state control method for multi-cluster gaming provided by the present invention;
FIG. 8 is a schematic diagram of a multi-agent state control device for multi-cluster gaming according to the present invention;
Fig. 9 is a schematic diagram of an embodiment of a terminal provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the particular embodiments presented herein are illustrative of the invention and are not intended to limit the invention.
Example 1
The multi-agent state control method for multi-cluster game provided by the invention can be applied to a terminal, and the terminal can control the state of each agent in the multi-agent system through the multi-agent state control method for multi-cluster game provided by the invention, so that the multi-agent system achieves Nash equilibrium.
In this specification, l 1 norm is represented by +| 1. Sign symbolRepresenting a real set,/>Representing a positive real set, diag { b 1,…,bn } represents a diagonal matrix, where the i-th diagonal element is/>Vector 0 n is an all-zero vector, matrix O n is an n-dimensional null matrix, symbol (), T, represents a matrix transpose operation,/>Representing the derivative of x with respect to time.
As shown in fig. 1, in one embodiment of the multi-agent state control method for multi-cluster gaming, the method includes the steps of:
S100, acquiring a first communication relation among the intelligent agents in each cluster of the intelligent agent system and a second communication relation among the leading intelligent agents of each cluster.
Specifically, the intelligent agent system comprises m clusters, each cluster comprises a plurality of intelligent agents, the intelligent agents in the clusters need to agree on the decision of the clusters, each cluster is internally provided with a leading intelligent agent, and the leading intelligent agent can obtain corresponding local resource allocation constraint and inequality constraint information. And the m leading agents game in the inter-cluster topology network and determine a local generalized Nash equilibrium search strategy.
S200, determining first communication parameters from each intelligent agent to neighbor intelligent agents in each cluster according to the first communication relation, and determining second communication parameters from each cluster to neighbor clusters according to the second communication relation.
Specifically, the determining, according to the first communication relationship, a first communication parameter from each agent to a neighboring agent in each cluster includes:
Undirected communication between neighbor agents within each cluster, the neighbor agent set of agent i within cluster j is represented as Said first communication parameter of agent i to agent k is denoted/>Said first communication parameter of agent k to agent i is denoted/>
The neighbor agents of the agents in the clusters refer to agents capable of performing information interaction with the agents, as shown in fig. 2, the neighbor agents in each cluster are in undirected communication, that is, the agents in the clusters can receive information from the neighbor agents and can also send information to the neighbor agents. Cluster j may be represented by an undirected graph G j. Laplacian matrix L j=Dj-Aj of undirected graph G j, whereIs a degree matrix based on the/>, which is related to the agent i in the clusterFor the diagonal matrix of diagonal elements, i E {1,2,.. N j},nj is the number of agents in the j-th cluster, agent k is connected to agent i by an information interaction edge E ik E, the information interaction edge E ik E represents that agent i and agent k can interact information in both directions, if the information interaction edge E ik E, a ik=aki > 0, otherwise a ik =0,/>Is a weighted adjacency matrix. It can be seen that according to the communication relationship between the agents in the cluster, each first communication parameter from each agent in the cluster to each neighbor agent can be determined, and the laplace matrix of the undirected graph corresponding to the cluster is determined.
The determining, according to the second communication relationship, a second communication parameter from each cluster to a neighbor cluster, including:
directed communication between the leading agent of a cluster and the leading agent of a neighbor cluster, the neighbor cluster set of cluster j being represented as The second communication parameter of cluster j to cluster l is denoted/>When the leading agent of cluster j can receive information from the leading agent of cluster l, then/>
When the leading agent of the a-cluster is able to receive information from or send information to the leading agent of the B-cluster, the B-cluster is said to be a neighbor agent of the a-cluster. As shown in fig. 2, there is a case where the leading agent of a cluster has a directional communication with the leading agent of a neighbor cluster, that is, one of the leading agents of a cluster can only receive information from the leading agent of a neighbor cluster and cannot transmit information. The communication relationship between clusters may be represented by a directed graph G 0. Laplacian matrix L 0=Din-A0 of directed graph G 0, whereIs an incorporative matrix based on the correlation of cluster j in multi-agent systemFor the diagonal matrix of diagonal elements, j E {1,2,.. M }, m being the number of clusters in the multi-agent system, cluster i is connected with cluster j by an information interaction edge E jl E, information interaction edge E jl E indicates that the leader agent of cluster j is able to receive information from the leader agent of cluster l, a jl >0 if information interaction edge E jl E, or a jl =0,/>Is a weighted adjacency matrix. It can be seen that according to the communication relationship between the leading agents of the clusters, each second communication parameter from each cluster in the clusters to each neighbor cluster of the clusters can be determined, and the laplace matrix of the directed graph corresponding to the multi-agent system can be determined.
Referring to fig. 1 again, the multi-agent state control method for multi-cluster game provided in this embodiment further includes the steps of:
S300, constructing an intelligent agent state control function according to preset inequality constraint, preset cost function, each first communication parameter and each second communication parameter.
Specifically, the decision variables of agent i in cluster j are expressed asIs a q-dimensional vector where j e {1, …, m }, i e {1, …, n j },/>The stack of decision variables for all agents in cluster j is in the form ofWithout loss of generality, the present invention marks the leading agent in the cluster as the first agent in the cluster, i.e./>A state decision vector representing the leading agent of cluster j. The communication topology graph among the m leader agents of the multi-cluster can be modeled as the balancing directed graph G 0 of the foregoing by discussing gaming and deciding on local generalized nash equilibrium search strategies in the inter-cluster topology network. Except that the internal agent connectivity topology of cluster j e {1, …, m } is represented as the undirected graph G j above due to collaboration requirements. All participating agents' decision variables except the decision variable x j can be expressed as/>The agents in cluster j e {1, …, m } can only be selected from neighbor clusters/> in G 0 Information of x _j is received. Furthermore,/>Is part of x _j, defined as x _j information that is available to agent i in cluster j, i.e./>The cost function for cluster j is:
Wherein/> k=1,2,/>For the local cost function of agent i in cluster j,/>Is a convex function.
For each agent i, there are two convex functionsContained in its local cost function, where/>At/>The upper part is smooth,/>At/>Is non-smooth in the upper part, and the local inequality constraint/>Is non-smooth and is only learned by the lead agent of cluster j, and for a given x _j, the goal of the agent in cluster j participating in the game is to minimize the cost function value of cluster j if the constraint is met, i.e., minimize the problem for a given solution:
min Fj(xj,x_j)
s.t.(xj,x_j)∈Ω (1)
Wherein, Is a feasibility set of decision variables, wherein/>Matrix L j is the laplace matrix of graph G j corresponding to cluster j.
The generalized Nash equilibrium definition for the multi-cluster gaming problem (1) is as follows:
Decision variable x * is generalized Nash equalization for multi-cluster gaming problem (1), with agents in all clusters j ε {1, …, m }:
lemma 1: definition of the definition D.epsilon.D (x *). Solution of generalized variational inequality
Is generalized Nash equalization for the multi-cluster gaming problem (1), where j ε {1, …, m }.
For the accuracy of the multi-agent system multi-cluster gaming problem (1) description, several preconditions are given:
1. Cost function for all j ε {1, …, m } and i ε {1, …, n j } Pair/>, given x _j Is secondarily continuous slightly and strongly convex, that is to say that there is a constant K 1 > 0 such that for agent i there is: wherein y and z are auxiliary variables,/> y≠z。
2. For all j e {1, …, m } and i e {1, …, n j },And g j pair/>, given x _j Is a lower semi-continuous closed fit convex function, they pair/>Can be obtained by simple calculation. In addition, g j is K 2,j -Li Puxi Ks continuous, constant K 2=maxj∈{1,…,m}[K2,j, and K 2 > 0. That is, for each cluster's inequality constraint g j, the corresponding K 2,j may be determined, taking the maximum value among all K 2,j as K 2.
3. The inter-cluster topology graph G 0 is a strongly connected balanced directed graph. For all j∈ {1, …, m }, the intra-cluster topology graph G j is an undirected connected graph.
4. The stoneley condition of the multi-cluster gaming problem (1) may be satisfied, i.e. there is at least one feasible point for the multi-cluster gaming problem (1).
The conditions for establishment of the generalized variational inequality solution are as follows, which is also generalized Nash equalization of the problem (2):
And (4) lemma 2: Is generalized Nash equilibrium of problem (1), then there is a constant/> And/>So that
Where j ε {1, …, m },When i is not equal to 1,/>
According to the game problem established in the foregoing, the invention designs a distributed Li Puxi-z continuous generalized Nash equilibrium search algorithm, and specifically constructs an intelligent body state control function as follows:
Wherein the method comprises the steps of State decision vector representing leading agent of cluster j,/>Y j、νj、ηj、ωj、rj are auxiliary variables,/>And when i+.1,/>Alpha 1、α2、α3、α4 is a constant.
The composite form is as follows:
Wherein the method comprises the steps of And l=diag (L 1,…,Lm).
And (3) lemma 3: if it isIs the equilibrium point for algorithm (6), then x * is the generalized Nash equilibrium for multi-cluster game (2).
And 4, lemma: in particular, ifIs the equilibrium point of algorithm (6), then there is
Wherein the method comprises the steps ofAnd/>
The lyapunov alternative function is designed as V (x, μ, y, V, η, ω) =v 1(x,y)+V2(x,μ)+V3 (V, η, ω),
Wherein the method comprises the steps of
And is also provided withAnd/>Wherein/>And there is/>, for i+.j, in cluster j ε {1, …, m }
The inventors have found that, if0<α4<1-K2α2,/> And is also provided withWhere λ min (·) and λ nax (·) represent the minimum eigenvalue and the maximum eigenvalue in brackets, respectively, then (6) the driven trajectory (x, μ, y, v, η, ω) is bounded and x will converge to a generalized nash equilibrium for the multi-cluster gaming problem (1) at t→infinity. The conclusion is demonstrated below:
(1) V (x **,y*,v***) =0 is readily available. It will be demonstrated below that V (x, μ, y, V, η, ω) > 0 is (x, μ, y, V, η, ω) noteq (x **,y*,v***). Combined with alpha 3, have
Since F 1 (x) is strongly convex, there is a relationshipIt is easy to demonstrate that V 3 (V, eta, omega) is not less than 0. Binding/>Alpha 4 > 0, have
Wherein κ 1=1-α3λmax(L)>0,κ2=α41 > 0. Thus, V (x, μ, y, V, η, ω) is positive, radially unbounded, V (x, μ, y, V, η, ω) is ≡0 and zero when (x, μ, y, V, η, ω) = (x **,y*,v***).
(2) Next, it is proved that there are all t.gtoreq.0From (6)
It means
Due to convexity of F 2 (x), sub-differentiationIs monotonous, and thus can be seen
Wherein the method comprises the steps ofThereby having the following characteristics
Combining the quotation marks 4 and the formula (12) is thatAnd
From the above formulas (12) to (14), it can be seen that the derivatives of the lyapunov alternative function V (x, μ, y, V, η, ω) along the trajectory of (6) satisfy
Wherein the method comprises the steps of
Definition v=v a+vb, where v a is an all 1 vector, due to the fact thatTherefore there is/>And is also provided withBecause/>So there is/>, again AndThus, from the assumption of alpha k, k ε {1, …,5}, it can be deduced from equation (16)
Wherein the method comprises the steps of ε1=1-α42K22=2α1K1-2α2K24-2,Epsilon 4=2-2α2K2. Since V (x, μ, y, V, η, ω) is positive, radially unbounded and lower bounded, the equilibrium point (x **,y*,v***) is lyapunov stable and the trajectory (x, μ, y, V, η, ω) is bounded.
Defining a setNamely there isAssuming that D is the maximum invariant set of T, it is known that the trajectory (x, μ, y, v, η, ω) converges to D at t→infinity under the driving of (6) according to the invariance principle. If (x, μ, y, v, η, ω) is the trace starting from (x 00,y0,v000) ε D under the drive of (6), then there is/>, for all t.gtoreq.0Thus/>This means/>If/>Then there is/>This conflicts with the invariance set principle. Furthermore, it is possible to deduce/>, using x≡x *、η≡η* and v b≡0mq />From the demonstration of quotation 1, it can be seen that/>The trajectory (x, μ, y, v, η, ω) then converges to the equilibrium point of the system driven by (6) according to lemma 3. Finally, as can be seen from the lemma 2, the system state x will converge to generalized Nash equilibrium for the multi-cluster game problem (4) at t→infinity.
Referring to fig. 1 again, the multi-agent state control method for multi-cluster game provided in this embodiment further includes the steps of:
S400, controlling the state of each agent at each moment according to the agent state control function so that the agent system achieves Nash equilibrium.
The controlling the state of each agent according to the agent state control function includes:
determining a state decision vector of each intelligent agent at each moment according to the intelligent agent state control function;
And determining the target state of each intelligent agent at each moment according to the state decision vector of each intelligent agent at each moment, and controlling each intelligent agent to reach the target state corresponding to each moment respectively at each moment.
According to the evidence, the state x of the intelligent system driven by the intelligent state control function finally reaches Nash equilibrium along with the time, the state decision vector of each intelligent body at the target moment can be determined according to the intelligent state control function, and the target state of the intelligent body at the target moment can be determined according to the state decision vector of the intelligent body at the target moment, so that the state of each intelligent body at each moment is controlled to reach the target state corresponding to each moment.
The effectiveness of the method provided by the invention is verified by carrying out corresponding simulation on the multi-agent system multi-cluster game problem with the constraint of the non-smooth cost function and the non-smooth inequality under the distributed smooth search function (6). As shown in fig. 2, it is assumed that the system is composed of sixteen agents whose models are first-order integrators, which can be divided into four clusters, the leading agents of each cluster are numbered 1, 5, 8, 13, respectively, the directed communication topology between the clusters is represented by an orange solid line, and the undirected communication topology inside the clusters is represented by a blue dashed line. The specific form of the distributed multi-cluster game problem of the non-smooth multi-intelligent system aimed by the simulation is that
Wherein the method comprises the steps ofIn the non-smooth inequality constraint/>D j=[16.1,15.5,16.1,14.5]T,Rj=[-3,1,3,3,2,-1,-2,-4]T. The local cost function f i(xi) of agent i consists of the following functions:
/>
Wherein the method comprises the steps of And/>Representing a smooth cost function and a local restriction set/>, respectivelyIs a function of the indication of (a). The initial bits of the agent are x1(0)=[-5.6m,3.9m,-8m,10m,-11.6m,7.5m,-10m,4m]T,x2(0)=[1.8m,3.2m,4.1m,8.9m,5.8m,4.9m]T,x3(0)=[3.5m,-1.9m,5.2m,-6.1m,1.3m,-7.2m,7.9m,-4.1m,6.1m,-8.9m]T and x 4(0)=[-3.9m,-1m,-7.9m,-3.2m,-6.1m,-7.5m,-9.8m,-6.1m]T. Let the initial values of the auxiliary variables all be zero. The simulation step size is t p =0.1s, the iteration step number is n=3000, and the running time is t= 361.8s. System status/>And (3) withThe trajectory variation of (a) is shown in fig. 4 and 5. Fig. 6 shows the trajectory variation of the global benefit function F (x). FIG. 7 shows the non-smooth inequality constraints/>, which individual agents are subject toTrack change over time. As can be seen from fig. 3-7, the state of the multi-agent system eventually satisfies the non-smooth inequality constraint and the resource allocation constraint, and achieves generalized nash equalization for the multi-cluster gaming problem.
In summary, the present embodiment provides a multi-agent state control method for multi-cluster game, which determines a first communication parameter from each agent in a cluster to a neighbor agent according to a first communication relationship between each agent in each cluster of an agent system, determines a second communication parameter from each cluster to a neighbor cluster according to a second communication relationship between leading agents of each cluster, and constructs an agent state control function according to a preset inequality constraint and cost function, the first communication parameter and the second communication parameter to control states between each agent, so that the whole agent system achieves Nash equilibrium.
It should be understood that, although the steps in the flowcharts shown in the drawings of the present specification are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in the flowcharts may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order in which the sub-steps or stages are performed is not necessarily sequential, and may be performed in turn or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
Example two
Based on the above embodiment, the present invention further provides a multi-agent state control device for multi-cluster game, as shown in fig. 8, where the device includes:
The communication relation acquiring module is configured to acquire a first communication relation between the respective agents in each cluster of the agent system and a second communication relation between the leading agents of each cluster, which is specifically described in the first embodiment;
The communication parameter determining module is configured to determine a first communication parameter from each agent in each cluster to a neighboring agent according to the first communication relationship, and determine a second communication parameter from each cluster to a neighboring cluster according to the second communication relationship, as described in embodiment one;
the function construction module is configured to construct an agent state control function according to a preset inequality constraint, a preset cost function, each first communication parameter and each second communication parameter, as described in embodiment one;
And the state control module is used for controlling the state of each intelligent agent according to the intelligent agent state control function so as to enable the intelligent agent system to achieve Nash equilibrium, and the state control module is specifically as described in the first embodiment.
Example III
Based on the above embodiment, the present invention also correspondingly provides a terminal, as shown in fig. 9, which includes a processor 10 and a memory 20. Fig. 9 shows only some of the components of the terminal, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may alternatively be implemented.
The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may in other embodiments also be an external storage device of the terminal, such as a plug-in hard disk provided on the terminal, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), etc. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software and various data installed in the terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores a multi-agent state control program 30 for multi-cluster gaming, and the multi-agent state control program 30 for multi-cluster gaming is executed by the processor 10, so as to implement the multi-agent state control method for multi-cluster gaming in the present application.
The processor 10 may in some embodiments be a central processing unit (Central Processing Unit, CPU), microprocessor or other chip for executing program code or processing data stored in the memory 20, such as performing multi-agent state control methods of the multi-cluster game, etc.
Example IV
The present invention also provides a storage medium storing one or more programs executable by one or more processors to implement the steps of the multi-agent state control method for multi-cluster gaming as described above.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A multi-agent state control method for multi-cluster gaming, the method comprising:
acquiring a first communication relationship between the intelligent agents in each cluster of the intelligent agent system and a second communication relationship between the leading intelligent agents of each cluster;
Determining first communication parameters from each intelligent agent to neighbor intelligent agents in each cluster according to the first communication relation, and determining second communication parameters from each cluster to neighbor clusters according to the second communication relation;
Determining a first communication parameter from each agent to a neighbor agent in each cluster according to the first communication relation, including:
Undirected communication between neighbor agents within each cluster, the neighbor agent set of agent i within cluster j is represented as Said first communication parameter of agent i to agent k is denoted/>Said first communication parameter of agent k to agent i is denoted/>
The determining, according to the second communication relationship, a second communication parameter from each cluster to a neighbor cluster, including:
directed communication between the leading agent of a cluster and the leading agent of a neighbor cluster, the neighbor cluster set of cluster j being represented as The second communication parameter of cluster j to cluster l is denoted/>When the leading agent of cluster j can receive information from the leading agent of cluster l, then/>
Constructing an intelligent body state control function according to preset inequality constraint, preset cost function, each first communication parameter and each second communication parameter;
the cost function is:
Wherein/> For the local cost function of agent i in cluster j,/>Is a convex function;
The state decision variables of agent i in cluster j are expressed as Where j is {1, L, m }, i is {1, L, n j },M represents the number of clusters in the intelligent system, n j represents the number of intelligent agents in the cluster j, n represents the number of intelligent agents in the intelligent system, and the stack form of state decision variables of all intelligent agents in the cluster j is/>The state decision variables of all participating agents except the state decision variable x j are represented as As part of x -j, is defined as x -j information that is available to agent i in cluster j;
cost function for all j ε {1, L, m } and i ε {1, L, n j }, cost function Pair/>, given x -j Is secondarily continuous and slightly and strongly convex,/>Pair/>, given x -j Is a lower semi-continuous closed convex function;
And controlling the state of each agent at each moment according to the agent state control function so as to ensure that the agent system achieves Nash equilibrium.
2. The multi-agent state control method of multi-cluster gaming of claim 1, wherein the inequality constraint of cluster j is g j.
3. The multi-agent state control method of multi-cluster gaming of claim 2, wherein g j is specific to a given x -j pairIs the lower semi-continuous closed convex function, and g j is K 2,j -Li Puxi z continuous, K 2,j is a constant, K 2,j >0.
4. The multi-agent state control method of multi-cluster gaming of claim 3, wherein the agent state control function is:
Wherein the method comprises the steps of State decision vector representing leading agent of cluster j,/>Y j、νj、ηj、ωj、rj is an auxiliary variable which is used as a main variable,And when i+.1,/>Alpha 1、α2、α3、α4 is a constant.
5. The multi-agent state control method of multi-cluster gaming of claim 4, wherein said controlling the state of each agent according to said agent state control function comprises:
determining a state decision vector of each intelligent agent at each moment according to the intelligent agent state control function;
And determining the target state of each intelligent agent at each moment according to the state decision vector of each intelligent agent at each moment, and controlling each intelligent agent to reach the target state corresponding to each moment respectively at each moment.
6. The utility model provides a multi-agent state controlling means of multi-cluster recreation which characterized in that, multi-agent state controlling means of multi-cluster recreation includes:
the communication relation acquisition module is used for acquiring a first communication relation between the intelligent agents in each cluster of the intelligent agent system and a second communication relation between the leading intelligent agents of each cluster;
The communication parameter determining module is used for determining first communication parameters from each intelligent agent to a neighbor intelligent agent in each cluster according to the first communication relation and determining second communication parameters from each cluster to the neighbor cluster according to the second communication relation;
Determining a first communication parameter from each agent to a neighbor agent in each cluster according to the first communication relation, including:
Undirected communication between neighbor agents within each cluster, the neighbor agent set of agent i within cluster j is represented as Said first communication parameter of agent i to agent k is denoted/>Said first communication parameter of agent k to agent i is denoted/>
The determining, according to the second communication relationship, a second communication parameter from each cluster to a neighbor cluster, including:
directed communication between the leading agent of a cluster and the leading agent of a neighbor cluster, the neighbor cluster set of cluster j being represented as The second communication parameter of cluster j to cluster l is denoted/>When the leading agent of cluster j can receive information from the leading agent of cluster l, then/>
The function construction module is used for constructing an intelligent body state control function according to preset inequality constraint, preset cost function, each first communication parameter and each second communication parameter;
the cost function is:
Wherein/> For the local cost function of agent i in cluster j,/>Is a convex function;
The state decision variables of agent i in cluster j are expressed as Where j is {1, L, m }, i is {1, L, n j },M represents the number of clusters in the intelligent system, n j represents the number of intelligent agents in the cluster j, n represents the number of intelligent agents in the intelligent system, and the stack form of state decision variables of all intelligent agents in the cluster j is/>The state decision variables of all participating agents except the state decision variable x j are represented as As part of x -j, is defined as x -j information that is available to agent i in cluster j;
cost function for all j ε {1, L, m } and i ε {1, L, n j }, cost function Pair/>, given x -j Is secondarily continuous and slightly and strongly convex,/>Pair/>, given x -j Is a lower semi-continuous closed convex function;
And the state control module is used for controlling the state of each intelligent agent according to the intelligent agent state control function so as to ensure that the intelligent agent system achieves Nash equilibrium.
7. The multi-agent state control device of multi-cluster gaming of claim 6, wherein the agent state control function is:
wherein the state decision variables of agent i in cluster j are represented as Where j is {1, L, m }, i is {1, L, n j },/>M represents the number of clusters in the intelligent system, n j represents the number of intelligent agents in the cluster j, n represents the number of intelligent agents in the intelligent system, and the stack form of state decision variables of all intelligent agents in the cluster j isThe state decision variables of all participating agents except the state decision variable x j are represented as As part of x -j, defined as the information of x -j that is receivable at agent i of cluster j, the inequality constraint of cluster j is g j,/>State decision vector representing leading agent of cluster j,/>Y j、νj、ηj、ωj are auxiliary variables,/>And when i+.1,/>Alpha 1、α2、α3、α4 is a constant and the neighbor set of agent i within cluster j is denoted/>Said first communication parameter of agent i to agent k is denoted/>The neighbor cluster set of cluster j is denoted/> The second communication parameter of cluster j to cluster l is denoted/>Is the local cost function of agent i in cluster j.
8. A terminal, the terminal comprising: a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions, the processor adapted to invoke the instructions in the storage medium to perform the steps of implementing the multi-agent state control method of the multi-cluster gaming of any of the preceding claims 1-5.
9. A storage medium storing one or more programs executable by one or more processors to implement the steps of the multi-agent state control method of multi-cluster gaming of any of claims 1-5.
CN202110923586.9A 2021-08-12 2021-08-12 Multi-agent state control method, device and terminal for multi-cluster game Active CN113778619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110923586.9A CN113778619B (en) 2021-08-12 2021-08-12 Multi-agent state control method, device and terminal for multi-cluster game

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110923586.9A CN113778619B (en) 2021-08-12 2021-08-12 Multi-agent state control method, device and terminal for multi-cluster game

Publications (2)

Publication Number Publication Date
CN113778619A CN113778619A (en) 2021-12-10
CN113778619B true CN113778619B (en) 2024-05-14

Family

ID=78837438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110923586.9A Active CN113778619B (en) 2021-08-12 2021-08-12 Multi-agent state control method, device and terminal for multi-cluster game

Country Status (1)

Country Link
CN (1) CN113778619B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114488802B (en) * 2022-01-18 2023-11-03 周佳玲 Nash equilibrium appointed time searching method for intra-group decision-making consistent multi-group game
CN115097726B (en) * 2022-04-25 2023-03-10 深圳市人工智能与机器人研究院 Intelligent agent consensus control method, device, equipment and storage terminal
CN115333956B (en) * 2022-10-17 2023-01-31 南京信息工程大学 Multi-agent state control method for multi-union non-cooperative game

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105759633A (en) * 2016-05-04 2016-07-13 华东交通大学 Controllable contained control method of multi-robot system with strongly-connected components
CN110109351A (en) * 2019-04-08 2019-08-09 广东工业大学 A kind of multiple agent consistency control method based on specified performance
WO2020024170A1 (en) * 2018-08-01 2020-02-06 东莞理工学院 Nash equilibrium strategy and social network consensus evolution model in continuous action space
CN112000108A (en) * 2020-09-08 2020-11-27 北京航空航天大学 Multi-agent cluster grouping time-varying formation tracking control method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040134337A1 (en) * 2002-04-22 2004-07-15 Neal Solomon System, methods and apparatus for mobile software agents applied to mobile robotic vehicles
US20190354100A1 (en) * 2018-05-21 2019-11-21 Board Of Regents, The University Of Texas System Bayesian control methodology for the solution of graphical games with incomplete information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105759633A (en) * 2016-05-04 2016-07-13 华东交通大学 Controllable contained control method of multi-robot system with strongly-connected components
WO2020024170A1 (en) * 2018-08-01 2020-02-06 东莞理工学院 Nash equilibrium strategy and social network consensus evolution model in continuous action space
CN110109351A (en) * 2019-04-08 2019-08-09 广东工业大学 A kind of multiple agent consistency control method based on specified performance
CN112000108A (en) * 2020-09-08 2020-11-27 北京航空航天大学 Multi-agent cluster grouping time-varying formation tracking control method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Event-Triggering Containment Control for a Class of Multi-Agent Networks With Fixed and Switching Topologies;Wenbing Zhang et.al;《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS》;20170331;第64卷(第3期);,620页右栏第行-最后一行,examples部分,附图3, V. CONCLUSION部分 *
多智能体协调控制的演化博弈方法;王龙;杜金铭;;系统科学与数学;20160315(第03期);全文 *
多智能体系统的性能优化;马婧瑛;郑元世;王龙;;系统科学与数学;20150315(第03期);全文 *

Also Published As

Publication number Publication date
CN113778619A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN113778619B (en) Multi-agent state control method, device and terminal for multi-cluster game
Yu et al. Multi-agent adversarial inverse reinforcement learning
Wahlström et al. From pixels to torques: Policy learning with deep dynamical models
US20210019619A1 (en) Machine learnable system with conditional normalizing flow
CN112766499A (en) Method for realizing autonomous flight of unmanned aerial vehicle through reinforcement learning technology
US20190354100A1 (en) Bayesian control methodology for the solution of graphical games with incomplete information
US11281232B2 (en) Systems and methods for multi-agent system control using consensus and saturation constraints
CN112668381A (en) Method and apparatus for recognizing image
Jin et al. Inverse reinforcement learning via deep gaussian process
van der Pol et al. Multi-agent MDP homomorphic networks
CN116841317A (en) Unmanned aerial vehicle cluster collaborative countermeasure method based on graph attention reinforcement learning
Bonaccorso et al. Python: Advanced Guide to Artificial Intelligence: Expert machine learning systems and intelligent agents using Python
CN116974185A (en) Multi-agent binary consistency control method, device, equipment and storage medium
KR20210115863A (en) Method and appartus of parallel processing for neural network model
Bremer et al. Model-based integration of constrained search spaces into distributed planning of active power provision
CN111694272B (en) Self-adaptive control method and device of nonlinear multi-agent based on fuzzy logic system
CN112637120B (en) Multi-agent system consistency control method, terminal and storage medium
Zhang et al. Distributed robust group output synchronization control for heterogeneous uncertain linear multi-agent systems
Lopez-Guede et al. State-action value function modeled by ELM in reinforcement learning for hose control problems
CN111880571A (en) Rigid formation switching method and device for unmanned aerial vehicle
CN117928530A (en) Method and apparatus for path distribution estimation
CN116483633A (en) Data augmentation method and related device
KR102494952B1 (en) Method and appauatus for initializing deep learning model using variance equalization
Feng et al. Model‐free distributed optimal control for continuous‐time linear systems
Giraldo et al. Sailboat navigation control system based on spiking neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant