CN115797394A - Multi-agent covering method based on reinforcement learning - Google Patents

Multi-agent covering method based on reinforcement learning Download PDF

Info

Publication number
CN115797394A
CN115797394A CN202211432494.1A CN202211432494A CN115797394A CN 115797394 A CN115797394 A CN 115797394A CN 202211432494 A CN202211432494 A CN 202211432494A CN 115797394 A CN115797394 A CN 115797394A
Authority
CN
China
Prior art keywords
agent
mobile
coverage
area
agents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211432494.1A
Other languages
Chinese (zh)
Other versions
CN115797394B (en
Inventor
孙新苗
任明里
丁大伟
任莹莹
王恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202211432494.1A priority Critical patent/CN115797394B/en
Publication of CN115797394A publication Critical patent/CN115797394A/en
Application granted granted Critical
Publication of CN115797394B publication Critical patent/CN115797394B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Feedback Control In General (AREA)

Abstract

The invention discloses a multi-agent covering method based on reinforcement learning, which comprises the following steps: determining the positions of a plurality of static agents in an area with the aim of maximizing coverage performance, and dividing the area into an effective coverage area and an ineffective coverage area according to the positions of the static agents; calculating the maximum coverage performance which can be obtained by the mobile intelligent agent; setting observations and actions of the mobile agent, and setting rewards for the mobile agent based on the maximum coverage performance that the mobile agent can achieve; each mobile agent aims at maximizing respective reward functions, and based on a reinforcement learning algorithm, a plurality of mobile agents interact with the environment at the same time to perform distributed training to obtain the motion planning of each mobile agent, so that coverage of areas which are not covered effectively is realized. The technical scheme of the invention can realize effective coverage of the multi-agent cooperation completion area and improve the coverage performance of the area.

Description

Multi-agent covering method based on reinforcement learning
Technical Field
The invention relates to the technical field of multi-agent system coverage optimization, in particular to a multi-agent coverage method based on reinforcement learning.
Background
With the rapid development of computers, micro-electro-mechanical systems, robotics and communication technologies, multi-agent systems are receiving more and more attention from people and are being applied to multiple fields such as coverage. Multi-agent zone coverage means that a plurality of agents form a team, and the whole zone is effectively covered through a cooperation strategy. The multiple intelligent agents cooperatively perform the area coverage task, so that the target task can be completed more efficiently, the limitation of the number and the angle of the sensors of the single intelligent agent can be overcome, and the intelligent agent coverage system has the characteristic of redundancy. At present, although the problem of full coverage of a region by multiple agents can be solved by the existing scheme, the coverage performance can not be improved while effective coverage is not realized.
Disclosure of Invention
The invention provides a multi-agent covering method based on reinforcement learning, which is used for quickly realizing effective covering of an area and improving the area covering performance.
In order to solve the technical problems, the invention provides the following technical scheme:
in one aspect, the present invention provides a reinforcement learning-based multi-agent overlay method, the multi-agent comprising a plurality of stationary agents and a plurality of mobile agents, the multi-agent overlay method comprising:
determining the positions of a plurality of static agents in an area by taking the maximum coverage performance as a target, and dividing the area into an effective coverage area and an ineffective coverage area according to the positions of the static agents;
calculating the maximum coverage performance which can be obtained by the mobile intelligent agent;
setting the observation and action of each mobile agent on the environment, and setting the reward of the mobile agents based on the maximum coverage performance which can be obtained by the mobile agents; each mobile agent aims at maximizing respective reward functions, and based on a reinforcement learning algorithm, a plurality of mobile agents interact with the environment at the same time to perform distributed training, so that the motion planning of each mobile agent is obtained, and coverage of areas which are not covered effectively is realized.
Further, the determining the locations of the plurality of stationary agents in the area with the goal of maximizing coverage performance includes:
the position of a plurality of stationary agents in an area is adjusted such that the coverage performance is as large as possible.
Further, the calculation function H (S) of the coverage performance is as follows:
H(S)=∫R(x)P(x,S)dx
wherein P (x, S) is the joint detection probability of the multi-agent at point x in the area,
Figure BDA0003945116060000021
Figure BDA0003945116060000022
p i (x,s i ) The detection probability of the ith agent, N is the number of agents, and R (x) is the event density function.
Further, when the area is divided into an effective coverage area and an ineffective coverage area, whether effective coverage exists in a point x in the area is judged according to whether the joint detection probability P (x, S) of the multi-agent at the position x is larger than a preset threshold value, when the probability P (x, S) is larger than the preset threshold value, effective coverage exists at the position x, and otherwise, the effective coverage does not exist at the position x.
Further, the observation of the mobile agent on the environment is three binary images; wherein the content of the first and second substances,
the first binary image represents an area which is not effectively covered currently;
the second binary image represents the position of the current mobile agent;
the third binary image represents the locations of other mobile agents in addition to the current mobile agent.
Further, the action set of the mobile agent is {0,1,2,3,4}, which respectively represents that the mobile agent is stationary, the mobile agent moves upwards, the mobile agent moves downwards, the mobile agent moves leftwards and the mobile agent moves rightwards.
Further, the Reward of the environment to the mobile agent is:
Reward=(H at present -H max )/10+incres*30
Wherein H At present Coverage performance of the mobile agent at the current location; h max Maximum coverage performance that can be achieved for a mobile agent; incres is the area of the newly added effective coverage area compared with the last moment; the first portion of the reward represents the difference in coverage performance of the mobile agent at the current location from the maximum value, and the second portion of the reward is a newly increased effective coverage area compared to the previous time.
Furthermore, based on the reinforcement learning algorithm, a plurality of mobile agents interact with the environment at the same time, and when distributed training is carried out, an operator network and a critic network of the mobile agents are set to be two convolution layers and three full connection layers; the first layer of convolution layers in the network are 16 convolution kernels of 20 × 20, the second layer of convolution layers are 8 convolution kernels of 10 × 10, and the number of channels of the three layers of fully-connected layers is 256, 128 and 64 respectively.
In yet another aspect, the present invention also provides an electronic device comprising a processor and a memory; wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the above-described method.
In yet another aspect, the present invention also provides a computer-readable storage medium having at least one instruction stored therein, which is loaded and executed by a processor to implement the above-mentioned method.
The technical scheme provided by the invention has the beneficial effects that at least:
1. the invention can realize the effective coverage of the multi-agent cooperation completion area.
2. The invention utilizes the decision optimization capability of reinforcement learning, and can improve the coverage performance of the area while realizing effective coverage. The invention has the advantages of high efficiency and strong robustness.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a reinforcement learning based multi-agent overlay method provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of a static agent location deployment provided by an embodiment of the invention;
FIG. 3 is a schematic diagram of mobile agent and environment interaction provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of effective coverage as a function of step size provided by an embodiment of the present invention;
fig. 5 is a schematic diagram of coverage performance provided by an embodiment of the present invention as a function of step size.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
First embodiment
The embodiment provides a multi-agent covering method based on reinforcement learning. First, it should be noted that the multi-agent in this embodiment includes two types of agents, i.e., a stationary agent and a mobile agent, and by controlling the movement of the mobile agent, the effective coverage of the area is realized and the coverage performance of the area is improved.
Based on the above, the execution flow of the method of the embodiment is shown in fig. 1, and includes the following steps:
s1, with the aim of maximizing coverage performance, determining the positions of a plurality of static intelligent agents in an area, and dividing the area into an effective coverage area and an ineffective coverage area according to the positions of the static intelligent agents;
therein, it is required toIt is noted that in determining the location of a stationary agent, the goal should be to maximize coverage performance, i.e., adjust the location of the multi-agent S = (S) 1 ,…,s N ) Making the coverage performance function H (S) as large as possible, wherein the coverage performance of the multi-agent in the area is the integral of the product of the event density and the detection probability in the area, namely: h (S) = R (x) P (x, S) dx, wherein P (x, S) is the joint detection probability of the multi-agent system S at point x,
Figure BDA0003945116060000041
p i (x,s i ) The detection probability for the ith agent, typically x and s i The distance between the two intelligent agents is a monotone decreasing function, N is the number of the intelligent agents, and R (x) is an event density function.
In particular, in the present embodiment, the location deployment of the static agents is as shown in fig. 2, where the grey area is the area where effective coverage has been achieved. And judging whether a point x in the area is effectively covered according to the judgment that whether the joint detection probability P (x, S) of the multi-agent at the point x is greater than a threshold value rho, when the P (x, S) is greater than the rho, indicating that the x is effectively covered, otherwise, indicating that the x is not effectively covered. After the inactive coverage areas are obtained, the mobile agent aims to cover the inactive coverage areas, namely P (x, S) is larger than or equal to rho at a certain moment, and the coverage performance H (S) is improved as much as possible in the moving process.
S2, calculating the maximum coverage performance which can be obtained by the mobile intelligent agent;
it should be noted that, in this embodiment, the maximum coverage performance H that can be obtained by the mobile agent is calculated max I.e. maximizing the coverage performance H (S) =: (x) P (x, S) dx. The purpose is to be used for the calculation of the reward function of the mobile agent in the subsequent steps, wherein the condition that the number of the mobile agents is small can be calculated by a greedy algorithm generally, namely, the coverage performance is increased most by adding one mobile agent at a time on the basis of the static agent.
S3, setting the observation and action of each mobile intelligent agent on the environment, and setting the reward of the mobile intelligent agents based on the maximum coverage performance which can be obtained by the mobile intelligent agents; each mobile agent aims at maximizing respective reward functions, and based on a reinforcement learning algorithm, a plurality of mobile agents interact with the environment at the same time to perform distributed training, so that the motion planning of each mobile agent is obtained, and coverage of areas which are not covered effectively is realized.
It should be noted that the above steps are preparation steps for reinforcement learning and training of the mobile agent, fig. 3 is an interaction between three exemplary mobile agents and an environment, and before the reinforcement learning and training of the mobile agent, an action set of the agent, observation of the agent on the environment, and a reward function given to the agent by the environment need to be set. The environment is a grid environment in which a static agent exists, and a mobile agent can select 5 actions of static, upward movement, downward movement, left movement and right movement in the environment, for this, the action set of the mobile agent is set to {0,1,2,3,4}, which respectively represents static, upward movement, downward movement, left movement and right movement, and the movement distance is one lattice. In order to realize effective coverage of the areas cooperatively, observing the environment by the intelligent agent is set into three binary images, wherein the first binary image represents the area which is not effectively covered currently, and in the effectively covered grid mark 1, the effectively uncovered grid mark 0 is arranged; the second binary image shows the position of the current mobile agent, and the position of the current mobile agent is marked with a mark 1; and the third binary image shows the current positions of other mobile agents, and 1 is marked on grids where the other mobile agents are located. The environment is set to reward the agent in two parts, which respectively embody the goals of realizing effective coverage and improving the coverage performance. The Reward of the environment to the mobile agent is:
Reward=(H at present -H max )/10+incres*30
Wherein H At present Coverage performance of the mobile agent at the current location; incres is the area of the effective coverage area newly increased compared with the last time; the first portion of the reward represents the difference in coverage performance of the mobile agent at the current location from the maximum value, and the second portion of the reward is a newly increased effective coverage area compared to the previous time. The reward of the function is to quickly realize effective coverage and improve the coverage performance of the area.
Further, based on a reinforcement learning algorithm, a plurality of mobile agents interact with the environment at the same time, and when distributed training is carried out, an operator network and a critic network of the mobile agents are set to be two convolution layers and three full connection layers; the first convolution layer in the network is 16 convolution kernels of 20 × 20, the second convolution layer is 8 convolution kernels of 10 × 10, and the number of channels of the three fully-connected layers is 256, 128 and 64 respectively.
Further, when a plurality of mobile agents are trained simultaneously, the embodiment trains by using a near-end policy optimization algorithm (PPO), which is a model-free online policy gradient reinforcement learning method, and the specific process is as follows:
a. the actor π (A | S; θ) is initialized with a random parameter θ, and the critic V (S; φ) is initialized with a random parameter φ.
b. And generating N sections of experiences following the current strategy, wherein the experience sequence is as follows:
S ts ,A ts ,R ts+1 ,…,S ts+N-1 ,A ts+N-1 ,R ts+N ,S ts+N
wherein, A t Is in state S t Action taken, S t+1 Is the next state, R t+1 Is state S t Transfer to S t+1 Awarding of the prize, the agent at S t Where the probability of taking each action is calculated by pi (A | S; theta) and the action A is randomly selected based on the probability distribution t
c. For each step t = ts +1, ts +2, \ 8230, ts + N, the return value G is calculated t And an advantaged function D t
Figure BDA0003945116060000051
δ k =R t +bγV(S t ;φ),G t =D t +V(S t (ii) a Phi) where, when S ts+N In the termination state, b is 0, otherwise 1, λ is a smoothing coefficient, and γ is a discount coefficient.
d. Randomly acquiring small batch of data with the size of M from the current experience set, learning from the small batch of data, and performing a function of minimizing loss
Figure BDA0003945116060000052
To update the critic's parameters by minimizing the action loss function
Figure BDA0003945116060000053
Figure BDA0003945116060000054
To update the actor, where r i (θ)=π(A i |S i ;θ)/π(A i |S i ;θ old ),c i (θ)=max(min(r i (θ), 1+ ε), 1- ε) to facilitate the exploration of agents, an entropy loss function is added
Figure BDA0003945116060000055
e. Repeating b to d until the training termination condition is reached.
By executing the above steps, the change of the effective coverage area ratio with the step length after the training is completed is shown in fig. 4, so that it can be known that the coverage rate of the embodiment can reach 97%, and the change of the coverage performance with the step length is shown in fig. 5, which shows that the coverage performance can be improved in the process of realizing the effective coverage and after the effective coverage is realized.
In summary, the embodiment provides a multi-agent coverage method based on reinforcement learning, which can achieve effective coverage of a multi-agent cooperation completion area. And the method utilizes the decision optimization capability of reinforcement learning, and can improve the coverage performance of the area while realizing effective coverage. The method has the advantages of high efficiency and strong robustness.
Second embodiment
The present embodiment provides an electronic device, which includes a processor and a memory; wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the method of the first embodiment.
The electronic device may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) and one or more memories, where at least one instruction is stored in the memory, and the instruction is loaded by the processor and executes the method.
Third embodiment
The present embodiment provides a computer-readable storage medium, which stores at least one instruction, and the instruction is loaded and executed by a processor to implement the method of the first embodiment. The computer readable storage medium may be, among others, ROM, random access memory, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. The instructions stored therein may be loaded by a processor in the terminal and perform the above-described method.
Furthermore, it should be noted that the present invention may be provided as a method, apparatus or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, an embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising one of \ 8230; \8230;" does not exclude the presence of additional like elements in a process, method, article, or terminal device that comprises the element.
Finally, it should be noted that while the above describes a preferred embodiment of the invention, it will be appreciated by those skilled in the art that, once having the benefit of the teaching of the present invention, numerous modifications and adaptations may be made without departing from the principles of the invention and are intended to be within the scope of the invention. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Claims (8)

1. A multi-agent overlay method based on reinforcement learning, wherein the multi-agent comprises a plurality of stationary agents and a plurality of mobile agents, the multi-agent overlay method comprising:
determining the positions of a plurality of static agents in an area by taking the maximum coverage performance as a target, and dividing the area into an effective coverage area and an ineffective coverage area according to the positions of the static agents;
calculating the maximum coverage performance which can be obtained by the mobile intelligent agent;
setting the observation and action of each mobile agent on the environment, and setting the reward of the mobile agents based on the maximum coverage performance which can be obtained by the mobile agents; each mobile agent aims at maximizing respective reward functions, and based on a reinforcement learning algorithm, a plurality of mobile agents interact with the environment at the same time to perform distributed training, so that the motion planning of each mobile agent is obtained, and coverage of areas which are not covered effectively is realized.
2. The reinforcement learning-based multi-agent coverage method of claim 1, wherein said determining locations of a plurality of stationary agents in an area with the goal of maximizing coverage performance comprises:
the position of a plurality of stationary agents in an area is adjusted such that the coverage performance is as large as possible.
3. The reinforcement learning-based multi-agent overlay method of claim 2, wherein the computation function H (S) of overlay performance is as follows:
H(S)=∫R(x)P(x,S)dx
wherein P (x, S) is the joint detection probability of the multi-agent at point x in the region,
Figure FDA0003945116050000011
Figure FDA0003945116050000012
p i (x,s i ) The detection probability of the ith agent, N is the number of agents, and R (x) is the event density function.
4. A reinforcement learning based multi-agent coverage method as claimed in claim 3, wherein in dividing the area into an existing effective coverage area and an ineffective coverage area, the judgment of whether a point x in the area has effective coverage is based on whether the joint detection probability P (x, S) of the multi-agent at x is larger than a preset threshold, when P (x, S) is larger than the preset threshold, it means that there has effective coverage at x, otherwise, there is no effective coverage at x.
5. The reinforcement learning-based multi-agent overlay method of claim 1, wherein the mobile agent's observations of the environment are three binary images; wherein, the first and the second end of the pipe are connected with each other,
the first binary image represents an area which is not effectively covered currently;
the second binary image represents the position of the current mobile agent;
the third binary image represents the locations of other mobile agents in addition to the current mobile agent.
6. The reinforcement learning-based multi-agent overlay method of claim 5, wherein the set of actions for the mobile agent is {0,1,2,3,4}, representing a mobile agent stationary, a mobile agent moving up, a mobile agent moving down, a mobile agent moving left, and a mobile agent moving right, respectively.
7. The reinforcement learning-based multi-agent overlay method of claim 6, wherein the Reward of the environment to the mobile agent is:
Reward=(H at present -H max )/10+incres*30
Wherein H At present Coverage performance of the mobile agent at the current location; h max Maximum coverage performance that can be achieved for a mobile agent; incres is the newly increased effective coverage area compared with the last moment; the first portion of the reward represents the difference in coverage performance of the mobile agent at the current location from the maximum value, and the second portion of the reward is a newly increased effective coverage area compared to the previous time.
8. A reinforcement learning based multi-agent coverage method as claimed in any one of claims 1 to 7, characterized in that, based on reinforcement learning algorithm, a plurality of mobile agents interact with the environment at the same time, when distributed training is carried out, the operator network and the critic network of the mobile agents are set to be two convolution layers plus three full connection layers; the first convolution layer in the network is 16 convolution kernels of 20 × 20, the second convolution layer is 8 convolution kernels of 10 × 10, and the number of channels of the three fully-connected layers is 256, 128 and 64 respectively.
CN202211432494.1A 2022-11-15 2022-11-15 Multi-agent coverage method based on reinforcement learning Active CN115797394B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211432494.1A CN115797394B (en) 2022-11-15 2022-11-15 Multi-agent coverage method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211432494.1A CN115797394B (en) 2022-11-15 2022-11-15 Multi-agent coverage method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN115797394A true CN115797394A (en) 2023-03-14
CN115797394B CN115797394B (en) 2023-09-05

Family

ID=85438088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211432494.1A Active CN115797394B (en) 2022-11-15 2022-11-15 Multi-agent coverage method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN115797394B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
WO2022083029A1 (en) * 2020-10-19 2022-04-28 深圳大学 Decision-making method based on deep reinforcement learning
CN114879742A (en) * 2022-06-17 2022-08-09 电子科技大学 Unmanned aerial vehicle cluster dynamic coverage method based on multi-agent deep reinforcement learning
CN115327926A (en) * 2022-09-15 2022-11-11 中国科学技术大学 Multi-agent dynamic coverage control method and system based on deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022083029A1 (en) * 2020-10-19 2022-04-28 深圳大学 Decision-making method based on deep reinforcement learning
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
CN114879742A (en) * 2022-06-17 2022-08-09 电子科技大学 Unmanned aerial vehicle cluster dynamic coverage method based on multi-agent deep reinforcement learning
CN115327926A (en) * 2022-09-15 2022-11-11 中国科学技术大学 Multi-agent dynamic coverage control method and system based on deep reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YUZE FENG等: "Rapid Coverage Control with Multi-agent Systems Based on K-Means Algorithm", 《2020 7TH INTERNATIONAL CONFERENCE ON INFORMATION, CYBERNETICS, AND COMPUTATIONAL SOCIAL SYSTEMS (ICCSS)》, pages 870 - 873 *
双炜: "应用多智能体链路认知的低轨卫星网络路由算法", 《航天器工程》, vol. 24, no. 4, pages 83 - 87 *
薛玉玺: "基于深度强化学习的多智能体集群区域覆盖算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 01, pages 140 - 397 *

Also Published As

Publication number Publication date
CN115797394B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN111563188B (en) Mobile multi-agent cooperative target searching method
Ross et al. Efficient reductions for imitation learning
CN110327624B (en) Game following method and system based on curriculum reinforcement learning
CN111260027B (en) Intelligent agent automatic decision-making method based on reinforcement learning
US20190286979A1 (en) Reinforcement Learning for Concurrent Actions
CN106250931A (en) A kind of high-definition picture scene classification method based on random convolutional neural networks
CN109064514A (en) A kind of six-freedom degree pose algorithm for estimating returned based on subpoint coordinate
CN111105034A (en) Multi-agent deep reinforcement learning method and system based on counter-fact return
CN112533237B (en) Network capacity optimization method for supporting large-scale equipment communication in industrial internet
US20220176554A1 (en) Method and device for controlling a robot
CN116051683B (en) Remote sensing image generation method, storage medium and device based on style self-organization
JP7448683B2 (en) Learning options for action selection using meta-gradient in multi-task reinforcement learning
CN113821041A (en) Multi-robot collaborative navigation and obstacle avoidance method
CN111553242B (en) Training method for generating countermeasure network for predicting driving behavior and electronic device
CN110335466B (en) Traffic flow prediction method and apparatus
CN110930429A (en) Target tracking processing method, device and equipment and readable medium
CN113947022B (en) Near-end strategy optimization method based on model
CN115797394A (en) Multi-agent covering method based on reinforcement learning
CN113239629B (en) Method for reinforcement learning exploration and utilization of trajectory space determinant point process
KR102299135B1 (en) Method and device that providing deep-running-based baduk game service
CN114330933B (en) Execution method of meta-heuristic algorithm based on GPU parallel computation and electronic equipment
CN109409507A (en) Neural network construction method and equipment
CN114840024A (en) Unmanned aerial vehicle control decision method based on context memory
WO2022036567A1 (en) Target detection method and device, and vehicle-mounted radar
KR102299141B1 (en) Method and device for providing deep-running-based baduk game service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant