CN114298244A - Decision control method, device and system for intelligent agent group interaction - Google Patents

Decision control method, device and system for intelligent agent group interaction Download PDF

Info

Publication number
CN114298244A
CN114298244A CN202111676244.8A CN202111676244A CN114298244A CN 114298244 A CN114298244 A CN 114298244A CN 202111676244 A CN202111676244 A CN 202111676244A CN 114298244 A CN114298244 A CN 114298244A
Authority
CN
China
Prior art keywords
decision control
model
behavior
state
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111676244.8A
Other languages
Chinese (zh)
Inventor
余超
刘岳鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202111676244.8A priority Critical patent/CN114298244A/en
Publication of CN114298244A publication Critical patent/CN114298244A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a decision control method, a decision control device and a decision control system for intelligent agent group interaction. The decision control device comprises an initial interaction unit, a model training unit and a decision control unit. The decision control system also comprises a decision control module and a data storage module. By constructing an initial decision control model comprising a top-level learning model and a bottom-level learning model and performing top-level and bottom-level fusion training on the initial decision control model, a final decision control model is obtained and then decision control is performed.

Description

Decision control method, device and system for intelligent agent group interaction
Technical Field
The invention belongs to the field of decision control of intelligent agent group interaction, and relates to a decision control method, a decision control device and a decision control system of intelligent agent group interaction.
Background
Under large-scale group interaction scenes, such as a large-scale multi-player online character game, a stock right trading market, an advertisement online auction, urban traffic flow and a military intelligent cluster, massive individuals act on the same environment in a concurrent mode and adjust own strategies in real time, and the dynamism and the scalability provide new challenges for the multi-agent reinforcement learning algorithm.
In the prior art, generally, decision control is performed on group interaction through a maddppg algorithm based on a Central Training Distributed Execution (CTDE) learning mode, a VDN algorithm based on a value decomposition idea, or a learning method based on a mean field theory; the MADDPG algorithm based on a Central Training Distributed Execution (CTDE) learning mode acquires the states, behaviors and target strategies of all individuals by using a centrally controlled criticic network in a training stage, and each agent Actor makes a decision according to local information in an execution stage; based on a VDN algorithm of a value decomposition idea, each intelligent agent realizes the maximization of a global gain function by maximizing a local gain function, so that the cooperation of a plurality of intelligent agents is realized (the cooperation among the intelligent agents is realized by depicting the interaction among individuals); the learning method based on the mean field theory can macroscopically express the state information and the action information from a group level, so that the problems of dimension disaster and complex interaction in group decision are better solved.
However, the prior art still has the following defects: coordination among individuals, coordination between individuals and neighborhood agents, and coordination between groups and groups cannot be considered at the same time, so that the decision control effect during group interaction is poor.
Therefore, there is a need for a method, an apparatus and a system for controlling intelligent agent group interaction in a decision-making manner, so as to overcome the above-mentioned drawbacks in the prior art.
Disclosure of Invention
In view of the above technical problems, an object of the present invention is to provide a method, an apparatus and a system for controlling a decision of group interaction of agents, so as to improve the effectiveness of decision control during group interaction of agents.
The invention provides a decision control method for group interaction of an agent, which comprises the following steps: acquiring a preset initial decision control model, and enabling an intelligent agent group to perform group interaction according to the initial decision control model so as to acquire an initial decision control data set; the initial decision control model comprises a top-layer learning model and a bottom-layer learning model; training the top-level learning model and the bottom-level learning model by using the initial decision control data set so as to obtain a final decision control model; and carrying out decision control on the group interaction of the intelligent agent according to the final decision control model.
In one embodiment, the obtaining a preset initial decision control model to enable a group of agents to perform group interaction according to the initial decision control model, thereby obtaining an initial decision control data set specifically includes: acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network; inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set; inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent; inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
In one embodiment, training the top-level learning model and the bottom-level learning model using the initial decision control data set to obtain a final decision control model specifically includes: dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group; acquiring a learning objective according to the average behavior value and the reward sum value of each group; training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times; judging whether the training times reach a preset time threshold value or not; and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
In one embodiment, after determining whether the training times reaches a preset time threshold, the method further includes: and when the training times do not reach a preset time threshold value, continuing the model training.
The invention also provides a decision control device for group interaction of the intelligent agents, which comprises an initial interaction unit, a model training unit and a decision control unit, wherein the initial interaction unit is used for acquiring a preset initial decision control model, so that the group interaction of the intelligent agent group is carried out according to the initial decision control model, and an initial decision control data set is acquired; the model training unit is used for training a preset top-layer learning model and a preset bottom-layer learning model by using the initial decision control data set so as to obtain a final decision control model; and the decision control unit is used for carrying out decision control on the group interaction of the intelligent agent according to the final decision control model.
In one embodiment, the initial interaction unit is further configured to: acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network; inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set; inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent; inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
In one embodiment, the model training unit is further configured to: dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group; acquiring a learning objective according to the average behavior value and the reward sum value of each group; training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times; judging whether the training times reach a preset time threshold value or not; and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
The invention also provides a decision control system for group interaction of the intelligent agents, which further comprises a decision control module and a data storage module, wherein the decision control module is in communication connection with the data storage module, the decision control module is used for carrying out group interaction decision control on the intelligent agent group according to the decision control method for group interaction of the intelligent agents, and the data storage module is used for storing all data.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the invention provides a decision control method, a decision control device and a decision control system for group interaction of an intelligent agent.
Drawings
The invention will be further described with reference to the accompanying drawings, in which:
FIG. 1 illustrates a flow diagram of one embodiment of a method for decision control of agent population interaction in accordance with the present invention;
FIG. 2 shows a schematic of the training of the top-level learning model and the bottom-level learning model;
FIG. 3 is a block diagram illustrating one embodiment of an intelligent agent group interaction decision control apparatus in accordance with the present invention;
FIG. 4 is a block diagram illustrating one embodiment of an intelligent agent group interaction decision control system in accordance with the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Detailed description of the preferred embodiment
The embodiment of the invention first describes a decision control method for intelligent agent group interaction. FIG. 1 is a flow chart illustrating one embodiment of a method for decision control of agent population interaction in accordance with the present invention.
As shown in fig. 1, the decision control method includes the following steps:
and S1, acquiring a preset initial decision control model, and enabling intelligent agent groups to perform group interaction according to the initial decision control model, thereby acquiring an initial decision control data set.
The initial decision control model includes a top-level learning model and a bottom-level learning model.
In one embodiment, the obtaining a preset initial decision control model to enable a group of agents to perform group interaction according to the initial decision control model, thereby obtaining an initial decision control data set specifically includes: acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network; inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set; inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent; inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
And S2, training the top-layer learning model and the bottom-layer learning model by utilizing the initial decision control data set so as to obtain a final decision control model.
To further illustrate the initial decision control model, FIG. 2 shows a schematic of the training of the top-level learning model and the bottom-level learning model.
Where top-level learning coordinates collaboration between several groups based on macro-angle. And utilizing a CTDE learning mode to apply the learned cooperative information to the average field Q value network of each group, namely utilizing the average field information of a plurality of groups to obtain the average field Q values corresponding to the plurality of groups, and summing to obtain a global Q value. Meanwhile, global rewards are obtained based on the real rewards obtained by each agent, and a top learning goal is formed. And updating parameters of the average field Q value network based on the target, and transmitting the updated average field Q value network to the bottom layer packet for learning. The bottom layer model realizes the cooperation between the agents in the grouping based on the mean field thought and the attention mechanism. The underlying learning is collaborative learning of agents within the same group. Therefore, only the mean field Q network with the top-level update completed is received in the process, and any information of other grouped agents is not received. Each packet contains a plurality of agents.
In practical applications, the state information set o is based on each packet while in the top-level learning phaseiMotion information set aiThe top layer relies on the weight matrix w (x) passed by the bottom layeri) Statistical population information μ (x)i) Where i is the group number, k is the agent number, xiEquivalent to concatenate (o)i,ai) I.e. concatenation of status information and action information. Wherein, mui(x) The calculation formula is as follows:
Figure BDA0003452045820000071
wherein N (i) represents the number of i-th group agents,
Figure BDA0003452045820000072
is equivalent to
Figure BDA0003452045820000073
Figure BDA0003452045820000074
Indicating status information of the kth agent in the ith group;
Figure BDA0003452045820000075
representing action information of the kth agent in the ith group.
Subsequently, the statistical information μ (x)i) Transmitting to the average field Q value network, and calculating to obtain Q corresponding to each groupMF(μ(xi) Finally a whole is obtained by the summing moduleLocal Q value Qtot:
Figure BDA0003452045820000076
Where m is the number of packets.
After obtaining the global Q value, it is necessary to minimize the loss function L based on the CTDE ideatotOptimizing QtotBased on a loss function LtotOptimizing QtotAnd further optimize QMF(μ(xi)). Wherein the loss function formula is:
Figure BDA0003452045820000077
in the formula (I), the compound is shown in the specification,
Figure BDA0003452045820000078
the independent local maximum (IGM) principle is used here, i.e. when the average field Q value Q of each packet is QMF(μ(xi) When all reach the maximum, the global Q value QtotAnd is maximized accordingly. This model is also based on the idea of CTDE to combine the mean field Q values Q of other packets during the top-level learning processMF(μ(xi) ) is centrally trained, but only the mean field Q of the packet is used in the underlying packetMF(μ(xi) Average field Q values for all packets are not collected.
In practical application, when in the bottom learning stage, the grouped mean value information
Figure BDA0003452045820000079
As a central node, among other things,
Figure BDA00034520458200000710
computing importance weights for each agent in the ith group based on the Attention mechanism
Figure BDA0003452045820000081
Where is the number of k agents.
Wherein the importance weight
Figure BDA0003452045820000082
The formula of (1) is:
Figure BDA0003452045820000083
in the formula, WKIs KkeyParameter of the network, WQIs KqueryA parameter of the network; kkeyNetwork and KqueryThe network is an internal embedded network of attention modules. Each agent is obtained through a local Q value network
Figure BDA0003452045820000084
t represents the time.
Subsequently, the mean field Q value Q of the top layer transfer needs to be fusedMF(μ(xi) ) and then a grouped local Q value is obtained through the weighted summation module
Figure BDA0003452045820000085
The partial Q value of the packet
Figure BDA0003452045820000086
The formula of (1) is:
Figure BDA0003452045820000087
in the formula (I), the compound is shown in the specification,
Figure BDA0003452045820000088
the importance of the kth agent in group i. By grouping local Q values
Figure BDA0003452045820000089
The cooperation of a plurality of agents in the group and the cooperation of the bottom layer and the top layer can be realized.
On the basis of the above, it is desirable to minimize the loss function
Figure BDA00034520458200000810
Training the packet Q function:
Figure BDA00034520458200000811
in the formula (I), the compound is shown in the specification,
Figure BDA00034520458200000812
wherein Q isMF(μ(xi) Is a top-level to bottom-level guideline and represents the mean field Q delivered at the top level. r isiIs the true prize value returned by the environment obtained by group i.
Figure BDA00034520458200000813
Is the training target for packet i.
In one embodiment, training the top-level learning model and the bottom-level learning model using the initial decision control data set to obtain a final decision control model specifically includes: dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group; acquiring a learning objective according to the average behavior value and the reward sum value of each group; training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times; judging whether the training times reach a preset time threshold value or not; and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
And S3, performing decision control on the group interaction of the intelligent agent according to the final decision control model.
In one embodiment, after determining whether the training times reaches a preset time threshold, the method further includes: and when the training times do not reach a preset time threshold value, continuing the model training.
The embodiment of the invention describes a decision control method for group interaction of an intelligent agent, which comprises the steps of constructing an initial decision control model comprising a top-layer learning model and a bottom-layer learning model, and performing top-layer and bottom-layer fusion training on the initial decision control model to obtain a final decision control model for decision control.
Detailed description of the invention
Besides the method, the embodiment of the invention also describes a decision control device for intelligent agent group interaction. Fig. 3 is a block diagram of an embodiment of a decision control device for group interaction of agents according to the present invention.
As shown, the decision control device includes an initial interaction unit 11, a model training unit 12, and a decision control unit 13.
The initial interaction unit 11 is configured to obtain a preset initial decision control model, so that an agent group performs group interaction according to the initial decision control model, thereby obtaining an initial decision control data set. In one embodiment, the initial interaction unit 11 is further configured to: acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network; inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set; inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent; inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
The model training unit 12 is configured to train a preset top-level learning model and a preset bottom-level learning model by using the initial decision control data set, so as to obtain a final decision control model. In one embodiment, the model training unit 12 is further configured to: dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group; acquiring a learning objective according to the average behavior value and the reward sum value of each group; training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times; judging whether the training times reach a preset time threshold value or not; and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
And the decision control unit 13 is configured to perform decision control on group interaction of the agents according to the final decision control model.
The embodiment of the invention discloses a decision control device for group interaction of an intelligent agent, which is characterized in that a final decision control model is obtained and then decision control is carried out by constructing an initial decision control model comprising a top-layer learning model and a bottom-layer learning model and carrying out top-layer and bottom-layer fusion training on the initial decision control model, and the decision control device improves the effectiveness of decision control during group interaction of the intelligent agent.
Detailed description of the preferred embodiment
Besides the method and the device, the invention also describes a decision control system for intelligent agent group interaction. FIG. 4 is a block diagram illustrating one embodiment of an intelligent agent group interaction decision control system in accordance with the present invention.
As shown in the figure, the decision control system further includes a decision control module 1 and a data storage module 2, the decision control module 1 is in communication connection with the data storage module 2, wherein the decision control module 1 is configured to perform decision control of group interaction on a group of agents according to the decision control method of group interaction of agents, and the data storage module 3 is configured to store all data.
In practical application, the decision control module 1 and the data storage module 2 are respectively connected to the intelligent agent group in a communication manner, so that the decision control module 1 can perform group interaction decision control on the intelligent agent group.
The embodiment of the invention discloses a decision control system for group interaction of an intelligent agent, which is characterized in that a final decision control model is obtained and then decision control is carried out by constructing an initial decision control model comprising a top-layer learning model and a bottom-layer learning model and carrying out top-layer and bottom-layer fusion training on the initial decision control model, and the decision control system improves the effectiveness of decision control during group interaction of the intelligent agent.
The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the invention, may occur to those skilled in the art and are intended to be included within the scope of the invention.

Claims (8)

1. A decision control method for group interaction of intelligent agents is characterized by comprising the following steps:
acquiring a preset initial decision control model, and enabling an intelligent agent group to perform group interaction according to the initial decision control model so as to acquire an initial decision control data set; the initial decision control model comprises a top-layer learning model and a bottom-layer learning model;
training the top-level learning model and the bottom-level learning model by using the initial decision control data set so as to obtain a final decision control model;
and carrying out decision control on the group interaction of the intelligent agent according to the final decision control model.
2. The method for controlling decision making for agent group interaction according to claim 1, wherein obtaining a preset initial decision control model, and enabling an agent group to perform group interaction according to the initial decision control model, thereby obtaining an initial decision control data set specifically comprises:
acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network;
inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set;
inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent;
inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
3. The method according to claim 2, wherein the training of the top-level learning model and the bottom-level learning model using the initial decision control data set to obtain a final decision control model comprises:
dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group;
acquiring a learning objective according to the average behavior value and the reward sum value of each group;
training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times;
judging whether the training times reach a preset time threshold value or not;
and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
4. The method of claim 3, wherein after determining whether the training number reaches a preset number threshold, the method further comprises:
and when the training times do not reach a preset time threshold value, continuing the model training.
5. A decision control device for intelligent agent group interaction is characterized by comprising an initial interaction unit, a model training unit and a decision control unit, wherein,
the initial interaction unit is used for acquiring a preset initial decision control model, so that an intelligent agent group carries out group interaction according to the initial decision control model, and an initial decision control data set is acquired;
the model training unit is used for training a preset top-layer learning model and a preset bottom-layer learning model by using the initial decision control data set so as to obtain a final decision control model;
and the decision control unit is used for carrying out decision control on the group interaction of the intelligent agent according to the final decision control model.
6. The apparatus for intelligent agent group interaction decision control according to claim 5, wherein the initial interaction unit is further configured to:
acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network;
inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set;
inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent;
inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
7. The intelligent agent group interaction decision control device of claim 6, wherein the model training unit is further configured to:
dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group;
acquiring a learning objective according to the average behavior value and the reward sum value of each group;
training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times;
judging whether the training times reach a preset time threshold value or not;
and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
8. A decision control system for group interaction of intelligent agents, which is characterized by further comprising a decision control module and a data storage module, wherein the decision control module is in communication connection with the data storage module, the decision control module is used for performing group interaction decision control on the intelligent agent group according to the decision control method for group interaction of intelligent agents as claimed in any one of claims 1 to 4, and the data storage module is used for storing all data.
CN202111676244.8A 2021-12-31 2021-12-31 Decision control method, device and system for intelligent agent group interaction Pending CN114298244A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111676244.8A CN114298244A (en) 2021-12-31 2021-12-31 Decision control method, device and system for intelligent agent group interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111676244.8A CN114298244A (en) 2021-12-31 2021-12-31 Decision control method, device and system for intelligent agent group interaction

Publications (1)

Publication Number Publication Date
CN114298244A true CN114298244A (en) 2022-04-08

Family

ID=80975692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111676244.8A Pending CN114298244A (en) 2021-12-31 2021-12-31 Decision control method, device and system for intelligent agent group interaction

Country Status (1)

Country Link
CN (1) CN114298244A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840892A (en) * 2022-12-09 2023-03-24 中山大学 Multi-agent hierarchical autonomous decision-making method and system in complex environment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840892A (en) * 2022-12-09 2023-03-24 中山大学 Multi-agent hierarchical autonomous decision-making method and system in complex environment
CN115840892B (en) * 2022-12-09 2024-04-19 中山大学 Multi-agent layering autonomous decision-making method and system in complex environment

Similar Documents

Publication Publication Date Title
US7537523B2 (en) Dynamic player groups for interest management in multi-character virtual environments
CN110852448A (en) Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning
CN111625361A (en) Joint learning framework based on cooperation of cloud server and IoT (Internet of things) equipment
Xu et al. Learning multi-agent coordination for enhancing target coverage in directional sensor networks
CN114415735B (en) Dynamic environment-oriented multi-unmanned aerial vehicle distributed intelligent task allocation method
CN114546608B (en) Task scheduling method based on edge calculation
CN114896899B (en) Multi-agent distributed decision method and system based on information interaction
CN113642233B (en) Group intelligent collaboration method for optimizing communication mechanism
CN112634019A (en) Default probability prediction method for optimizing grey neural network based on bacterial foraging algorithm
CN117289691A (en) Training method for path planning agent for reinforcement learning in navigation scene
CN114757362A (en) Multi-agent system communication method based on edge enhancement and related device
CN115022231B (en) Optimal path planning method and system based on deep reinforcement learning
CN114298244A (en) Decision control method, device and system for intelligent agent group interaction
Kamra et al. Deep fictitious play for games with continuous action spaces
Liu et al. Learning communication for cooperation in dynamic agent-number environment
CN116340737A (en) Heterogeneous cluster zero communication target distribution method based on multi-agent reinforcement learning
CN116992928A (en) Multi-agent reinforcement learning method for fair self-adaptive traffic signal control
CN113592079B (en) Collaborative multi-agent communication method oriented to large-scale task space
CN116367190A (en) Digital twin function virtualization method for 6G mobile network
CN114757092A (en) System and method for training multi-agent cooperative communication strategy based on teammate perception
CN110598835B (en) Automatic path-finding method for trolley based on Gaussian variation genetic algorithm optimization neural network
Ebrahimi et al. Dynamic difficulty adjustment in games by using an interactive self-organizing architecture
Liu Shortest path selection algorithm for cold chain logistics transportation based on improved artificial bee colony
Dai et al. Evolutionary neural network for ghost in Ms. Pac-Man
CN118366009B (en) Pedestrian track prediction method and system based on human group behavior characteristic guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination