CN114298244A - Decision control method, device and system for intelligent agent group interaction - Google Patents
Decision control method, device and system for intelligent agent group interaction Download PDFInfo
- Publication number
- CN114298244A CN114298244A CN202111676244.8A CN202111676244A CN114298244A CN 114298244 A CN114298244 A CN 114298244A CN 202111676244 A CN202111676244 A CN 202111676244A CN 114298244 A CN114298244 A CN 114298244A
- Authority
- CN
- China
- Prior art keywords
- decision control
- model
- behavior
- state
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 81
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 74
- 238000013500 data storage Methods 0.000 claims abstract description 11
- 230000006399 behavior Effects 0.000 claims description 73
- 238000013528 artificial neural network Methods 0.000 claims description 30
- 238000004891 communication Methods 0.000 claims description 4
- 230000004927 fusion Effects 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000005290 field theory Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a decision control method, a decision control device and a decision control system for intelligent agent group interaction. The decision control device comprises an initial interaction unit, a model training unit and a decision control unit. The decision control system also comprises a decision control module and a data storage module. By constructing an initial decision control model comprising a top-level learning model and a bottom-level learning model and performing top-level and bottom-level fusion training on the initial decision control model, a final decision control model is obtained and then decision control is performed.
Description
Technical Field
The invention belongs to the field of decision control of intelligent agent group interaction, and relates to a decision control method, a decision control device and a decision control system of intelligent agent group interaction.
Background
Under large-scale group interaction scenes, such as a large-scale multi-player online character game, a stock right trading market, an advertisement online auction, urban traffic flow and a military intelligent cluster, massive individuals act on the same environment in a concurrent mode and adjust own strategies in real time, and the dynamism and the scalability provide new challenges for the multi-agent reinforcement learning algorithm.
In the prior art, generally, decision control is performed on group interaction through a maddppg algorithm based on a Central Training Distributed Execution (CTDE) learning mode, a VDN algorithm based on a value decomposition idea, or a learning method based on a mean field theory; the MADDPG algorithm based on a Central Training Distributed Execution (CTDE) learning mode acquires the states, behaviors and target strategies of all individuals by using a centrally controlled criticic network in a training stage, and each agent Actor makes a decision according to local information in an execution stage; based on a VDN algorithm of a value decomposition idea, each intelligent agent realizes the maximization of a global gain function by maximizing a local gain function, so that the cooperation of a plurality of intelligent agents is realized (the cooperation among the intelligent agents is realized by depicting the interaction among individuals); the learning method based on the mean field theory can macroscopically express the state information and the action information from a group level, so that the problems of dimension disaster and complex interaction in group decision are better solved.
However, the prior art still has the following defects: coordination among individuals, coordination between individuals and neighborhood agents, and coordination between groups and groups cannot be considered at the same time, so that the decision control effect during group interaction is poor.
Therefore, there is a need for a method, an apparatus and a system for controlling intelligent agent group interaction in a decision-making manner, so as to overcome the above-mentioned drawbacks in the prior art.
Disclosure of Invention
In view of the above technical problems, an object of the present invention is to provide a method, an apparatus and a system for controlling a decision of group interaction of agents, so as to improve the effectiveness of decision control during group interaction of agents.
The invention provides a decision control method for group interaction of an agent, which comprises the following steps: acquiring a preset initial decision control model, and enabling an intelligent agent group to perform group interaction according to the initial decision control model so as to acquire an initial decision control data set; the initial decision control model comprises a top-layer learning model and a bottom-layer learning model; training the top-level learning model and the bottom-level learning model by using the initial decision control data set so as to obtain a final decision control model; and carrying out decision control on the group interaction of the intelligent agent according to the final decision control model.
In one embodiment, the obtaining a preset initial decision control model to enable a group of agents to perform group interaction according to the initial decision control model, thereby obtaining an initial decision control data set specifically includes: acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network; inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set; inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent; inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
In one embodiment, training the top-level learning model and the bottom-level learning model using the initial decision control data set to obtain a final decision control model specifically includes: dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group; acquiring a learning objective according to the average behavior value and the reward sum value of each group; training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times; judging whether the training times reach a preset time threshold value or not; and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
In one embodiment, after determining whether the training times reaches a preset time threshold, the method further includes: and when the training times do not reach a preset time threshold value, continuing the model training.
The invention also provides a decision control device for group interaction of the intelligent agents, which comprises an initial interaction unit, a model training unit and a decision control unit, wherein the initial interaction unit is used for acquiring a preset initial decision control model, so that the group interaction of the intelligent agent group is carried out according to the initial decision control model, and an initial decision control data set is acquired; the model training unit is used for training a preset top-layer learning model and a preset bottom-layer learning model by using the initial decision control data set so as to obtain a final decision control model; and the decision control unit is used for carrying out decision control on the group interaction of the intelligent agent according to the final decision control model.
In one embodiment, the initial interaction unit is further configured to: acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network; inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set; inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent; inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
In one embodiment, the model training unit is further configured to: dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group; acquiring a learning objective according to the average behavior value and the reward sum value of each group; training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times; judging whether the training times reach a preset time threshold value or not; and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
The invention also provides a decision control system for group interaction of the intelligent agents, which further comprises a decision control module and a data storage module, wherein the decision control module is in communication connection with the data storage module, the decision control module is used for carrying out group interaction decision control on the intelligent agent group according to the decision control method for group interaction of the intelligent agents, and the data storage module is used for storing all data.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the invention provides a decision control method, a decision control device and a decision control system for group interaction of an intelligent agent.
Drawings
The invention will be further described with reference to the accompanying drawings, in which:
FIG. 1 illustrates a flow diagram of one embodiment of a method for decision control of agent population interaction in accordance with the present invention;
FIG. 2 shows a schematic of the training of the top-level learning model and the bottom-level learning model;
FIG. 3 is a block diagram illustrating one embodiment of an intelligent agent group interaction decision control apparatus in accordance with the present invention;
FIG. 4 is a block diagram illustrating one embodiment of an intelligent agent group interaction decision control system in accordance with the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Detailed description of the preferred embodiment
The embodiment of the invention first describes a decision control method for intelligent agent group interaction. FIG. 1 is a flow chart illustrating one embodiment of a method for decision control of agent population interaction in accordance with the present invention.
As shown in fig. 1, the decision control method includes the following steps:
and S1, acquiring a preset initial decision control model, and enabling intelligent agent groups to perform group interaction according to the initial decision control model, thereby acquiring an initial decision control data set.
The initial decision control model includes a top-level learning model and a bottom-level learning model.
In one embodiment, the obtaining a preset initial decision control model to enable a group of agents to perform group interaction according to the initial decision control model, thereby obtaining an initial decision control data set specifically includes: acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network; inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set; inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent; inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
And S2, training the top-layer learning model and the bottom-layer learning model by utilizing the initial decision control data set so as to obtain a final decision control model.
To further illustrate the initial decision control model, FIG. 2 shows a schematic of the training of the top-level learning model and the bottom-level learning model.
Where top-level learning coordinates collaboration between several groups based on macro-angle. And utilizing a CTDE learning mode to apply the learned cooperative information to the average field Q value network of each group, namely utilizing the average field information of a plurality of groups to obtain the average field Q values corresponding to the plurality of groups, and summing to obtain a global Q value. Meanwhile, global rewards are obtained based on the real rewards obtained by each agent, and a top learning goal is formed. And updating parameters of the average field Q value network based on the target, and transmitting the updated average field Q value network to the bottom layer packet for learning. The bottom layer model realizes the cooperation between the agents in the grouping based on the mean field thought and the attention mechanism. The underlying learning is collaborative learning of agents within the same group. Therefore, only the mean field Q network with the top-level update completed is received in the process, and any information of other grouped agents is not received. Each packet contains a plurality of agents.
In practical applications, the state information set o is based on each packet while in the top-level learning phaseiMotion information set aiThe top layer relies on the weight matrix w (x) passed by the bottom layeri) Statistical population information μ (x)i) Where i is the group number, k is the agent number, xiEquivalent to concatenate (o)i,ai) I.e. concatenation of status information and action information. Wherein, mui(x) The calculation formula is as follows:
wherein N (i) represents the number of i-th group agents,is equivalent to Indicating status information of the kth agent in the ith group;representing action information of the kth agent in the ith group.
Subsequently, the statistical information μ (x)i) Transmitting to the average field Q value network, and calculating to obtain Q corresponding to each groupMF(μ(xi) Finally a whole is obtained by the summing moduleLocal Q value Qtot:
Where m is the number of packets.
After obtaining the global Q value, it is necessary to minimize the loss function L based on the CTDE ideatotOptimizing QtotBased on a loss function LtotOptimizing QtotAnd further optimize QMF(μ(xi)). Wherein the loss function formula is:
the independent local maximum (IGM) principle is used here, i.e. when the average field Q value Q of each packet is QMF(μ(xi) When all reach the maximum, the global Q value QtotAnd is maximized accordingly. This model is also based on the idea of CTDE to combine the mean field Q values Q of other packets during the top-level learning processMF(μ(xi) ) is centrally trained, but only the mean field Q of the packet is used in the underlying packetMF(μ(xi) Average field Q values for all packets are not collected.
In practical application, when in the bottom learning stage, the grouped mean value informationAs a central node, among other things,computing importance weights for each agent in the ith group based on the Attention mechanismWhere is the number of k agents.
in the formula, WKIs KkeyParameter of the network, WQIs KqueryA parameter of the network; kkeyNetwork and KqueryThe network is an internal embedded network of attention modules. Each agent is obtained through a local Q value networkt represents the time.
Subsequently, the mean field Q value Q of the top layer transfer needs to be fusedMF(μ(xi) ) and then a grouped local Q value is obtained through the weighted summation moduleThe partial Q value of the packetThe formula of (1) is:
in the formula (I), the compound is shown in the specification,the importance of the kth agent in group i. By grouping local Q valuesThe cooperation of a plurality of agents in the group and the cooperation of the bottom layer and the top layer can be realized.
On the basis of the above, it is desirable to minimize the loss functionTraining the packet Q function:
in the formula (I), the compound is shown in the specification,wherein Q isMF(μ(xi) Is a top-level to bottom-level guideline and represents the mean field Q delivered at the top level. r isiIs the true prize value returned by the environment obtained by group i.Is the training target for packet i.
In one embodiment, training the top-level learning model and the bottom-level learning model using the initial decision control data set to obtain a final decision control model specifically includes: dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group; acquiring a learning objective according to the average behavior value and the reward sum value of each group; training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times; judging whether the training times reach a preset time threshold value or not; and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
And S3, performing decision control on the group interaction of the intelligent agent according to the final decision control model.
In one embodiment, after determining whether the training times reaches a preset time threshold, the method further includes: and when the training times do not reach a preset time threshold value, continuing the model training.
The embodiment of the invention describes a decision control method for group interaction of an intelligent agent, which comprises the steps of constructing an initial decision control model comprising a top-layer learning model and a bottom-layer learning model, and performing top-layer and bottom-layer fusion training on the initial decision control model to obtain a final decision control model for decision control.
Detailed description of the invention
Besides the method, the embodiment of the invention also describes a decision control device for intelligent agent group interaction. Fig. 3 is a block diagram of an embodiment of a decision control device for group interaction of agents according to the present invention.
As shown, the decision control device includes an initial interaction unit 11, a model training unit 12, and a decision control unit 13.
The initial interaction unit 11 is configured to obtain a preset initial decision control model, so that an agent group performs group interaction according to the initial decision control model, thereby obtaining an initial decision control data set. In one embodiment, the initial interaction unit 11 is further configured to: acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network; inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set; inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent; inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
The model training unit 12 is configured to train a preset top-level learning model and a preset bottom-level learning model by using the initial decision control data set, so as to obtain a final decision control model. In one embodiment, the model training unit 12 is further configured to: dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group; acquiring a learning objective according to the average behavior value and the reward sum value of each group; training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times; judging whether the training times reach a preset time threshold value or not; and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
And the decision control unit 13 is configured to perform decision control on group interaction of the agents according to the final decision control model.
The embodiment of the invention discloses a decision control device for group interaction of an intelligent agent, which is characterized in that a final decision control model is obtained and then decision control is carried out by constructing an initial decision control model comprising a top-layer learning model and a bottom-layer learning model and carrying out top-layer and bottom-layer fusion training on the initial decision control model, and the decision control device improves the effectiveness of decision control during group interaction of the intelligent agent.
Detailed description of the preferred embodiment
Besides the method and the device, the invention also describes a decision control system for intelligent agent group interaction. FIG. 4 is a block diagram illustrating one embodiment of an intelligent agent group interaction decision control system in accordance with the present invention.
As shown in the figure, the decision control system further includes a decision control module 1 and a data storage module 2, the decision control module 1 is in communication connection with the data storage module 2, wherein the decision control module 1 is configured to perform decision control of group interaction on a group of agents according to the decision control method of group interaction of agents, and the data storage module 3 is configured to store all data.
In practical application, the decision control module 1 and the data storage module 2 are respectively connected to the intelligent agent group in a communication manner, so that the decision control module 1 can perform group interaction decision control on the intelligent agent group.
The embodiment of the invention discloses a decision control system for group interaction of an intelligent agent, which is characterized in that a final decision control model is obtained and then decision control is carried out by constructing an initial decision control model comprising a top-layer learning model and a bottom-layer learning model and carrying out top-layer and bottom-layer fusion training on the initial decision control model, and the decision control system improves the effectiveness of decision control during group interaction of the intelligent agent.
The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the invention, may occur to those skilled in the art and are intended to be included within the scope of the invention.
Claims (8)
1. A decision control method for group interaction of intelligent agents is characterized by comprising the following steps:
acquiring a preset initial decision control model, and enabling an intelligent agent group to perform group interaction according to the initial decision control model so as to acquire an initial decision control data set; the initial decision control model comprises a top-layer learning model and a bottom-layer learning model;
training the top-level learning model and the bottom-level learning model by using the initial decision control data set so as to obtain a final decision control model;
and carrying out decision control on the group interaction of the intelligent agent according to the final decision control model.
2. The method for controlling decision making for agent group interaction according to claim 1, wherein obtaining a preset initial decision control model, and enabling an agent group to perform group interaction according to the initial decision control model, thereby obtaining an initial decision control data set specifically comprises:
acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network;
inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set;
inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent;
inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
3. The method according to claim 2, wherein the training of the top-level learning model and the bottom-level learning model using the initial decision control data set to obtain a final decision control model comprises:
dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group;
acquiring a learning objective according to the average behavior value and the reward sum value of each group;
training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times;
judging whether the training times reach a preset time threshold value or not;
and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
4. The method of claim 3, wherein after determining whether the training number reaches a preset number threshold, the method further comprises:
and when the training times do not reach a preset time threshold value, continuing the model training.
5. A decision control device for intelligent agent group interaction is characterized by comprising an initial interaction unit, a model training unit and a decision control unit, wherein,
the initial interaction unit is used for acquiring a preset initial decision control model, so that an intelligent agent group carries out group interaction according to the initial decision control model, and an initial decision control data set is acquired;
the model training unit is used for training a preset top-layer learning model and a preset bottom-layer learning model by using the initial decision control data set so as to obtain a final decision control model;
and the decision control unit is used for carrying out decision control on the group interaction of the intelligent agent according to the final decision control model.
6. The apparatus for intelligent agent group interaction decision control according to claim 5, wherein the initial interaction unit is further configured to:
acquiring a preset initial decision control model and a preset opponent model of an opponent, initializing a preset group interaction platform, and acquiring a first state of an agent and a second state of the opponent; the initial decision control model comprises a local neural network;
inputting the first state into the local neural network for a first behavior and a first reward, inputting the second state into the adversary model for a second behavior and a second reward, and storing the first state, the second state, the first behavior, the second behavior, the first reward, and the second reward into an initial decision control data set;
inputting the first behavior and the second behavior into the group interaction platform, so as to correspondingly obtain a third state of the agent and a fourth state of the opponent;
inputting the third state into the local neural network for a third behavior and a third reward, inputting the fourth state into the adversary model for a fourth behavior and a fourth reward, and storing the third state, the fourth state, the third behavior, the fourth behavior, the third reward, and the fourth reward into an initial decision control data set.
7. The intelligent agent group interaction decision control device of claim 6, wherein the model training unit is further configured to:
dividing the intelligent agent group into groups with corresponding number according to the preset group number, and acquiring the average behavior value, the reward and the value of each group according to the initial decision control data group;
acquiring a learning objective according to the average behavior value and the reward sum value of each group;
training the top-level learning model according to the learning target and the initial decision control data set so as to obtain a first top-level model and a corresponding first mean neural network, training the bottom-level learning model according to the first mean neural network and the initial decision control data set, and recording training times;
judging whether the training times reach a preset time threshold value or not;
and when the training times reach a preset time threshold, stopping training and outputting a final decision control model.
8. A decision control system for group interaction of intelligent agents, which is characterized by further comprising a decision control module and a data storage module, wherein the decision control module is in communication connection with the data storage module, the decision control module is used for performing group interaction decision control on the intelligent agent group according to the decision control method for group interaction of intelligent agents as claimed in any one of claims 1 to 4, and the data storage module is used for storing all data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111676244.8A CN114298244A (en) | 2021-12-31 | 2021-12-31 | Decision control method, device and system for intelligent agent group interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111676244.8A CN114298244A (en) | 2021-12-31 | 2021-12-31 | Decision control method, device and system for intelligent agent group interaction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114298244A true CN114298244A (en) | 2022-04-08 |
Family
ID=80975692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111676244.8A Pending CN114298244A (en) | 2021-12-31 | 2021-12-31 | Decision control method, device and system for intelligent agent group interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114298244A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115840892A (en) * | 2022-12-09 | 2023-03-24 | 中山大学 | Multi-agent hierarchical autonomous decision-making method and system in complex environment |
-
2021
- 2021-12-31 CN CN202111676244.8A patent/CN114298244A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115840892A (en) * | 2022-12-09 | 2023-03-24 | 中山大学 | Multi-agent hierarchical autonomous decision-making method and system in complex environment |
CN115840892B (en) * | 2022-12-09 | 2024-04-19 | 中山大学 | Multi-agent layering autonomous decision-making method and system in complex environment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7537523B2 (en) | Dynamic player groups for interest management in multi-character virtual environments | |
CN110852448A (en) | Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning | |
CN111625361A (en) | Joint learning framework based on cooperation of cloud server and IoT (Internet of things) equipment | |
Xu et al. | Learning multi-agent coordination for enhancing target coverage in directional sensor networks | |
CN114415735B (en) | Dynamic environment-oriented multi-unmanned aerial vehicle distributed intelligent task allocation method | |
CN114546608B (en) | Task scheduling method based on edge calculation | |
CN114896899B (en) | Multi-agent distributed decision method and system based on information interaction | |
CN113642233B (en) | Group intelligent collaboration method for optimizing communication mechanism | |
CN112634019A (en) | Default probability prediction method for optimizing grey neural network based on bacterial foraging algorithm | |
CN117289691A (en) | Training method for path planning agent for reinforcement learning in navigation scene | |
CN114757362A (en) | Multi-agent system communication method based on edge enhancement and related device | |
CN115022231B (en) | Optimal path planning method and system based on deep reinforcement learning | |
CN114298244A (en) | Decision control method, device and system for intelligent agent group interaction | |
Kamra et al. | Deep fictitious play for games with continuous action spaces | |
Liu et al. | Learning communication for cooperation in dynamic agent-number environment | |
CN116340737A (en) | Heterogeneous cluster zero communication target distribution method based on multi-agent reinforcement learning | |
CN116992928A (en) | Multi-agent reinforcement learning method for fair self-adaptive traffic signal control | |
CN113592079B (en) | Collaborative multi-agent communication method oriented to large-scale task space | |
CN116367190A (en) | Digital twin function virtualization method for 6G mobile network | |
CN114757092A (en) | System and method for training multi-agent cooperative communication strategy based on teammate perception | |
CN110598835B (en) | Automatic path-finding method for trolley based on Gaussian variation genetic algorithm optimization neural network | |
Ebrahimi et al. | Dynamic difficulty adjustment in games by using an interactive self-organizing architecture | |
Liu | Shortest path selection algorithm for cold chain logistics transportation based on improved artificial bee colony | |
Dai et al. | Evolutionary neural network for ghost in Ms. Pac-Man | |
CN118366009B (en) | Pedestrian track prediction method and system based on human group behavior characteristic guidance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |