CN111880564A - Multi-agent area searching method based on collaborative reinforcement learning - Google Patents
Multi-agent area searching method based on collaborative reinforcement learning Download PDFInfo
- Publication number
- CN111880564A CN111880564A CN202010710554.6A CN202010710554A CN111880564A CN 111880564 A CN111880564 A CN 111880564A CN 202010710554 A CN202010710554 A CN 202010710554A CN 111880564 A CN111880564 A CN 111880564A
- Authority
- CN
- China
- Prior art keywords
- agent
- cluster
- gamma
- reinforcement learning
- value table
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000002787 reinforcement Effects 0.000 title claims abstract description 34
- 230000006399 behavior Effects 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 27
- 230000008569 process Effects 0.000 claims abstract description 15
- 238000004891 communication Methods 0.000 claims abstract description 10
- 230000004927 fusion Effects 0.000 claims abstract description 7
- 230000002452 interceptive effect Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 15
- 230000009471 action Effects 0.000 claims description 9
- 238000012856 packing Methods 0.000 claims description 4
- 241000970807 Thermoanaerobacterales Species 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 abstract description 4
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
- G05D1/104—Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a multi-agent area searching method based on collaborative reinforcement learning, which comprises the following steps: s1, establishing a motion model of a cluster system; s2, defining a fusion mode of a gamma information map and a cluster information map; s3, defining a state space and a behavior space required by reinforcement learning training; s4, defining an interactive reinforcement learning training method according to the state space and the behavior space; and S5, acquiring a Q value table obtained by training, carrying out region search according to the motion model, and determining the position of the next moment according to the Q value table. The invention realizes the sharing of the learning experience of the neighbors, and in the sharing process, the useless experience is filtered out in a screening mode, so that the learning efficiency is improved, and meanwhile, the communication traffic between the intelligent agents is greatly reduced.
Description
Technical Field
The invention relates to multi-agent area search, in particular to a multi-agent area search method based on collaborative reinforcement learning.
Background
The clustering phenomenon is a very common phenomenon in nature, and with the rise of artificial intelligence in recent years, the intelligent control field becomes a popular research field, and great progress is made in the aspects of intelligent bodies such as unmanned aerial vehicles, unmanned vehicles or mobile robots. The gradual maturity of the single-agent technology pushes the intelligent system to be converted into a cluster, and the packing cluster control algorithm is widely applied to tasks such as unmanned aerial vehicle searching, reconnaissance and striking. Confronted with increasingly complex combat environments and multitask requirements.
Q-learning is a typical reinforcement learning algorithm that converts learned experiences into a Q-table from which the best strategy can be selected. In the traversal process of the agent cluster, the gamma points in the multi-agent search system are planned through Q-learning, and after the learning of a Q-learning algorithm is completed, the optimal planning strategy of the gamma points can be obtained, so that the rapid traversal of the target area is completed.
Because the traditional Q-learning algorithm is an independent learning method, the historical experience of neighbors of the traditional Q-learning algorithm does not need to be used for reference in the learning process, so that the multi-agent system can learn the experience of the behavior in the same state for multiple times, and the learning efficiency of the system is greatly reduced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a multi-agent area searching method based on collaborative reinforcement learning, so that the sharing of the learning experience of neighbors is realized, useless experience is filtered out in a screening mode in the sharing process, the learning efficiency is improved, and meanwhile, the communication traffic among agents is greatly reduced.
The purpose of the invention is realized by the following technical scheme: a multi-agent area searching method based on collaborative reinforcement learning comprises the following steps:
s1, establishing a motion model of a cluster system;
s2, defining a fusion mode of a gamma information map and a cluster information map;
s3, defining a state space and a behavior space required by reinforcement learning training;
s4, defining an interactive reinforcement learning training method according to the state space and the behavior space;
and S5, acquiring a Q value table obtained by training, carrying out region search according to the motion model, and determining the position of the next moment according to the Q value table.
Further, the step S1 includes the following sub-steps:
based on a packing cluster control algorithm, assuming that a cluster V includes p agents, where V ═ 1,2.. p }, an ith agent in the cluster is defined as agent i, and a kinetic model of the ith agent is expressed as the following equation:
wherein p isiIs the location, v, of agent i of the agentiIs the speed, u, of agent i of the agentiAcceleration, u, for agent i of agentiControl input for the clustered agent;
during the search process, the control input quantity of each agent of the cluster is expressed as:
for the control input of cluster agents to avoid collisions,moving the control quantity to a desired position for the cluster agent;
csαfor normal numbers, the potential field force between p-agent i and p-agent j is defined as follows:
wherein z is the input quantity, piIs the location of cluster agent i;
dα=||d||α
wherein r isαCommunication distance, σ, between clustered agents1A, b and c are self-defined parameters;
wherein h and l are constantsThe design of the function guarantees the smoothness of the potential field function, and in order to guarantee the norm, the sigma norm is micro-defined:
wherein epsilon is a self-defined parameter;
the cluster agent moves the control amount to the desired position as follows:
in the formula (I), the compound is shown in the specification,proportional and differential control parameters in PID algorithm, vi is agent i speed, pγIs the expected location of agent i at the next time.
Further, the step S2 includes the following sub-steps:
assuming that the traversal area is a rectangular area of m × n, quantizing the area to be searched into a gamma-information map of k × l matrixes, wherein each quantized matrix corresponds to a gamma point, converting the complete search of the area into the complete traversal of the gamma points in the information map, and the gamma points form a gamma information map set of agent i
mi(γ)={γx,y},x=1,2....k,y=1,2....l;
Wherein k and l are obtained by:
rs is a self-defined parameter and represents the perception radius of agent i;
obtaining gamma information map { m) of all agents in cluster1(γx,y),m2(γx,y)......mp(γx,y) Fifthly, if agent i traverses the gamma point, the information m of the gamma pointi(γx,y) 1, otherwise mi(γx,y) 0; agent 1, agent2..... agent p establishes communication, and fuses the self gamma information map and the gamma information map of the neighbor thereof, wherein the fusion formula is as follows:
wherein m isi(γx,y) Is the information gamma information map of agent i, ms(γx,y) Is the all gamma information map of the cluster and V is the set of cluster agents.
Further, the step S3 includes:
acquiring the state space of each agent of the cluster, and defining the state of agent i as follows:
in the formula, MiIs section (gamma)The gamma information of point i maps the coverage situation,is the next moment of node i at the gamma map location;
acquiring the behavior space of each agent in the cluster, wherein the behavior is represented as the selection of gamma points, and when the node i is in a certain state SiThe selectable γ points are the current γ map location and 8 of its surrounding, and we denote these 9 locations by 1 to 9, so the node behavior space is defined as the following equation:
Ai={1,2,3,4,5,6,7,8,9}。
further, the step S4 includes:
according to the agent's state and behavior, for a typical Q-learning algorithm, the Q-value table update function is as follows:
where k represents the kth training, α is the learning rate, η is the discount factor, ai' denotes the next action, si' in the next state, the state of the,
in order to reduce the calculation complexity of the learning algorithm and the flow and accelerate the convergence speed of the learning algorithm, when agent A in the cluster is connected with other agents, the Q values of the agent A and the other agents can be obtained; considering only the state operation with larger Q value in the neighbor's Q value table for updating the Q value reference of agent, the Q value table of the ith agent in the (k +1) th iteration will be updated as follows:
wherein the content of the first and second substances,is the Q value of the jth agent, the number of neighbors of agent iwjThe weights are defined as follows:
qirepresenting the location of the ith agent of the cluster, raIs a constant that represents the abutting radius,
hr(. cndot.) is a threshold function defined as follows:
riis the return function as follows;
in the formula of gammax,y' to perform action aiThe next gamma point obtained, 0<cr<1 is a constant, krFor the number of repeated traversal, T is the time consumed in the process of traversing the gamma information map or covering the dynamic area, and r (T) is defined as follows:
in the formulaAndis constant, andrrefis a constant, is a standard return value, T, for the entire traversal processminThe minimum crossing time under ideal conditions is calculated by the following formula:
wherein m and n are the size of the search area, k and l are the number of corresponding information maps, vmaxIs constant and represents the maximum speed; agent selects the action with the maximum weight value according to the current environment to update the Q value table; compared with the traditional reinforcement learning method, the cooperative reinforcement learning method has the advantages that the efficiency is improved, the training process is optimized through the interaction of Q value tables among the intelligent agents in the cluster, and the training time is shortened.
Further, the step S5 includes:
repeating the steps S1-S4 to update the Q value table of the agent in the iterative cluster until the Q value table converges; the cluster area searching algorithm after the collaborative reinforcement learning training generates a Q value table of the behavior of the whole cluster at the next moment when the whole cluster carries out searching tasks; after the collaborative reinforcement learning training, each intelligent agent selects the best behavior at the next moment according to the Q value table to maximize the efficiency of area search, which is expressed as follows
pr=ai'=argmaxQi(si,ai)
siRepresenting the state a of the intelligent agent at the current i momentiRepresenting the behavior of the agent selected at the current moment; a isi' represents the selection of the optimal search state at the next moment of the agent;
according to ai' obtaining the optimal desired position P in S2rAnd through PrCalculating the speed and the position of the intelligent agent, and iteratively inquiring a Q value table to realize the regional global search of the cluster:
the invention has the beneficial effects that: the invention provides a cooperative Q-learning algorithm based on the traditional Q-learning, realizes the sharing of the learning experience of the neighbor, filters out useless experience by a screening mode in the sharing process, greatly reduces the communication traffic between intelligent agents while improving the learning efficiency, and improves the cluster searching efficiency and effect.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
Detailed Description
The technical solutions of the present invention are further described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the following.
As shown in fig. 1, a multi-agent area search method based on cooperative reinforcement learning includes the following steps:
s1, establishing a motion model of a cluster system;
s2, defining a fusion mode of a gamma information map and a cluster information map;
s3, defining a state space and a behavior space required by reinforcement learning training;
s4, defining an interactive reinforcement learning training method according to the state space and the behavior space;
and S5, acquiring a Q value table obtained by training, carrying out region search according to the motion model, and determining the position of the next moment according to the Q value table.
Further, the step S1 includes the following sub-steps:
based on a packing cluster control algorithm, assuming that a cluster V includes p agents, where V ═ 1,2.. p }, an ith agent in the cluster is defined as agent i, and a kinetic model of the ith agent is expressed as the following equation:
wherein p isiIs the location, v, of agent i of the agentiIs the speed, u, of agent i of the agentiAcceleration, u, for agent i of agentiControl input for the clustered agent;
during the search process, the control input quantity of each agent of the cluster is expressed as:
for the control input of cluster agents to avoid collisions,moving the control quantity to a desired position for the cluster agent;
csαfor normal numbers, the potential field force between p-agent i and p-agent j is defined as follows:
wherein z is the input quantity, piIs the location of cluster agent i;
dα=||d||σ
wherein r isαCommunication distance, σ, between clustered agents1A, b and c are self-defined parameters;
wherein h and l are constantsThe design of the function guarantees the smoothness of the potential field function, and in order to guarantee the norm, the sigma norm is micro-defined:
wherein epsilon is a self-defined parameter;
the cluster agent moves the control amount to the desired position as follows:
in the formula (I), the compound is shown in the specification,proportional and differential control parameters in PID algorithm, vi is agent i speed, pγIs the expected location of agent i at the next time.
Further, the step S2 includes the following sub-steps:
assuming that the traversal area is a rectangular area of m × n, quantizing the area to be searched into a gamma-information map of k × l matrixes, wherein each quantized matrix corresponds to a gamma point, converting the complete search of the area into the complete traversal of the gamma points in the information map, and the gamma points form a gamma information map set of agent i
mi(γ)={γx,y},x=1,2....k,y=1,2....l;
Wherein k and l are obtained by:
rs is a self-defined parameter and represents the perception radius of agent i;
obtaining gamma information map { m) of all agents in cluster1(γx,y),m2(γx,y)......mp(γx,y) Fifthly, if agent i traverses the gamma point, the information m of the gamma pointi(γx,y) 1, otherwise mi(γx,y) 0; agent 1, agent2..... agent p establishes communication, and fuses the self gamma information map and the gamma information map of the neighbor thereof, wherein the fusion formula is as follows:
wherein m isi(γx,y) Is the information gamma information map of agent i, ms(γx,y) Is the all gamma information map of the cluster and V is the set of cluster agents.
Further, the step S3 includes:
acquiring the state space of each agent of the cluster, and defining the state of agent i as follows:
in the formula, Mi(γ) is the γ information map coverage of node i,is the next moment of node i at the gamma map location;
acquiring the behavior space of each agent in the cluster, wherein the behavior is represented as the selection of gamma points, and when the node i is in a certain state SiThe selectable γ points are the current γ map location and 8 of its surrounding, and we denote these 9 locations by 1 to 9, so the node behavior space is defined as the following equation:
Ai={1,2,3,4,5,6,7,8,9}。
further, the step S4 includes:
according to the agent's state and behavior, for a typical Q-learning algorithm, the Q-value table update function is as follows:
where k represents the kth training, α is the learning rate, η is the discount factor, ai' denotes the next action, si' in the next state, the state of the,
in order to reduce the calculation complexity of the learning algorithm and the flow and accelerate the convergence speed of the learning algorithm, when agent A in the cluster is connected with other agents, the Q values of the agent A and the other agents can be obtained; considering only the state operation with larger Q value in the neighbor's Q value table for updating the Q value reference of agent, the Q value table of the ith agent in the (k +1) th iteration will be updated as follows:
wherein the content of the first and second substances,is the Q value of the jth agent, the number of neighbors of agent iwjThe weights are defined as follows:
qirepresenting the location of the ith agent of the cluster, raIs a constant that represents the abutting radius,
hr(. cndot.) is a threshold function defined as follows:
riis the return function as follows;
in the formula of gammax,y' to perform action aiThe next gamma point obtained, 0<cr<1 is a constant, krFor the number of repeated traversal, T is the time consumed in the process of traversing the gamma information map or covering the dynamic area, and r (T) is defined as follows:
in the formulaAndis constant, andrrefis a constant, is a standard return value, T, for the entire traversal processminThe minimum crossing time under ideal conditions is calculated by the following formula:
wherein m and n are the size of the search area, k and l are the number of corresponding information maps, vmaxIs constant and represents the maximum speed; agent selects the action with the maximum weight value according to the current environment to update the Q value table; compared with the traditional reinforcement learning method, the cooperative reinforcement learning method has the advantages that the efficiency is improved, and the Q value among the intelligent agents in the cluster is increasedThe interaction of the tables optimizes the training process and reduces the training time.
Further, the step S5 includes:
repeating the steps S1-S4 to update the Q value table of the agent in the iterative cluster until the Q value table converges; the cluster area searching algorithm after the collaborative reinforcement learning training generates a Q value table of the behavior of the whole cluster at the next moment when the whole cluster carries out searching tasks; after the collaborative reinforcement learning training, each intelligent agent selects the best behavior at the next moment according to the Q value table to maximize the efficiency of area search, which is expressed as follows
pr=ai'=argmaxQi(si,ai)
siRepresenting the state a of the intelligent agent at the current i momentiRepresenting the behavior of the agent selected at the current moment; a isi' represents the selection of the optimal search state at the next moment of the agent;
according to ai' obtaining the optimal desired position P in S2rAnd through PrCalculating the speed and the position of the intelligent agent, and iteratively inquiring a Q value table to realize the regional global search of the cluster:
the invention provides a cooperative Q-learning algorithm based on the traditional Q-learning, realizes the sharing of the learning experience of the neighbor, filters out useless experience by a screening mode in the sharing process, greatly reduces the communication traffic between intelligent agents while improving the learning efficiency, and improves the cluster searching efficiency and effect.
The foregoing is a preferred embodiment of the present invention, it is to be understood that the invention is not limited to the form disclosed herein, but is not to be construed as excluding other embodiments, and is capable of other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (6)
1. A multi-agent area searching method based on collaborative reinforcement learning is characterized in that: the method comprises the following steps:
s1, establishing a motion model of a cluster system;
s2, defining a fusion mode of a gamma information map and a cluster information map;
s3, defining a state space and a behavior space required by reinforcement learning training;
s4, defining an interactive reinforcement learning training method according to the state space and the behavior space;
and S5, acquiring a Q value table obtained by training, carrying out region search according to the motion model, and determining the position of the next moment according to the Q value table.
2. The multi-agent area search method based on cooperative reinforcement learning as claimed in claim 1, wherein: the step S1 includes the following sub-steps:
based on a packing cluster control algorithm, assuming that a cluster V includes p agents, where V ═ 1,2.. p }, an ith agent in the cluster is defined as agent i, and a kinetic model of the ith agent is expressed as the following equation:
wherein p isiIs the location, v, of agent i of the agentiIs the speed, u, of agent i of the agentiAcceleration, u, for agent i of agentiControl input for the clustered agent;
during the search process, the control input quantity of each agent of the cluster is expressed as:
for the control input of cluster agents to avoid collisions,moving the control quantity to a desired position for the cluster agent;
csαfor normal numbers, the potential field force between p-agent i and p-agent j is defined as follows:
wherein z is the input quantity, piIs the location of cluster agent i;
dα=||d||σ
wherein r isαCommunication distance, σ, between clustered agents1A, b and c are self-defined parameters;
wherein h and l are constantsThe design of the function guarantees the smoothness of the potential field function, and in order to guarantee the norm, the sigma norm is micro-defined:
wherein epsilon is a self-defined parameter;
the cluster agent moves the control amount to the desired position as follows:
3. The multi-agent area search method based on cooperative reinforcement learning as claimed in claim 1, wherein: the step S2 includes the following sub-steps:
assuming that the traversal area is a rectangular area of m × n, quantizing the area to be searched into a gamma-information map of k × l matrixes, wherein each quantized matrix corresponds to a gamma point, converting the complete search of the area into the complete traversal of the gamma points in the information map, and the gamma points form a gamma information map set of agent i
mi(γ)={γx,y},x=1,2....k,y=1,2....l;
Wherein k and l are obtained by:
rs is a self-defined parameter and represents the perception radius of agent i;
obtaining gamma information map { m) of all agents in cluster1(γx,y),m2(γx,y)......mp(γx,y) Fifthly, if agenti traverses the gamma point, the information m of the gamma pointi(γx,y) 1, otherwise mi(γx,y) 0; agent 1, agent2..... agent p establishes communication, and fuses the self gamma information map and the gamma information map of the neighbor thereof, wherein the fusion formula is as follows:
wherein m isi(γx,y) Is the information gamma information map of agent i, ms(γx,y) Is the all gamma information map of the cluster and V is the set of cluster agents.
4. The multi-agent area search method based on cooperative reinforcement learning as claimed in claim 1, wherein: the step S3 includes:
acquiring the state space of each agent of the cluster, and defining the state of agent i as follows:
in the formula, Mi(γ) is the γ information map coverage of node i,is the next moment of node i at the gamma map location;
acquiring the behavior space of each agent in the cluster, wherein the behavior is represented as the selection of gamma points, and when the node i is in a certain state SiThe selectable γ points are the current γ map location and 8 of its surrounding, and we denote these 9 locations by 1 to 9, so the node behavior space is defined as the following equation:
Ai={1,2,3,4,5,6,7,8,9}。
5. the multi-agent area search method based on cooperative reinforcement learning as claimed in claim 1, wherein: the step S4 includes:
according to the agent's state and behavior, for a typical Q-learning algorithm, the Q-value table update function is as follows:
where k represents the kth training, α is the learning rate, η is the discount factor, ai' denotes the next action, si' in the next state, the state of the,
in order to reduce the calculation complexity of the learning algorithm and the flow and accelerate the convergence speed of the learning algorithm, when the agent A in the cluster is connected with other agents, the Q values of the agent A and the agent A can be obtained; considering only the state operation with larger Q value in the neighbor's Q value table for updating the Q value reference of agent, the Q value table of the ith agent in the (k +1) th iteration will be updated as follows:
wherein Q isj k(si,ai) Is the Q value of the jth agent, the number of neighbors of agent iwjThe weights are defined as follows:
qirepresenting the location of the ith agent of the cluster, raIs a constant that represents the abutting radius,
hr(. cndot.) is a threshold function defined as follows:
riis the return function as follows;
in the formula of gammax,y' to perform action aiThe next gamma point obtained, 0<cr<1 is a constant, krFor the number of repeated traversal, T is the time consumed in the process of traversing the gamma information map or covering the dynamic area, and r (T) is defined as follows:
in the formulaAndis constant, andrrefis a constant, is a standard return value, T, for the entire traversal processminThe minimum crossing time under ideal conditions is calculated by the following formula:
wherein m and n are the size of the search area, k and l are the number of corresponding information maps, vmaxIs constant and represents the maximum speed; agent selects the action with the maximum weight value according to the current environmentUpdating the Q value table; compared with the traditional reinforcement learning method, the cooperative reinforcement learning method has the advantages that the efficiency is improved, the training process is optimized through the interaction of Q value tables among the intelligent agents in the cluster, and the training time is shortened.
6. The multi-agent area search method based on cooperative reinforcement learning as claimed in claim 1, wherein: the step S5 includes:
repeating the steps S1-S5 to update the Q value table of the agent in the iterative cluster until the Q value table converges; the cluster area searching algorithm after the collaborative reinforcement learning training generates a Q value table of the behavior of the whole cluster at the next moment when the whole cluster carries out searching tasks;
after the collaborative reinforcement learning training, each intelligent agent selects the best behavior at the next moment according to the Q value table to maximize the efficiency of area search, which is expressed as follows
pr=ai'=argmaxQi(si,ai)
siRepresenting the state a of the intelligent agent at the current i momentiRepresenting the behavior of the agent selected at the current moment; a isi' represents the selection of the optimal search state at the next moment of the agent;
according to ai' obtaining the optimal desired position P in S2rAnd through PrCalculating the speed and the position of the intelligent agent, and iteratively inquiring a Q value table to realize the regional global search of the cluster:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010710554.6A CN111880564A (en) | 2020-07-22 | 2020-07-22 | Multi-agent area searching method based on collaborative reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010710554.6A CN111880564A (en) | 2020-07-22 | 2020-07-22 | Multi-agent area searching method based on collaborative reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111880564A true CN111880564A (en) | 2020-11-03 |
Family
ID=73155230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010710554.6A Pending CN111880564A (en) | 2020-07-22 | 2020-07-22 | Multi-agent area searching method based on collaborative reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111880564A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113156954A (en) * | 2021-04-25 | 2021-07-23 | 电子科技大学 | Multi-agent cluster obstacle avoidance method based on reinforcement learning |
CN113189983A (en) * | 2021-04-13 | 2021-07-30 | 中国人民解放军国防科技大学 | Open scene-oriented multi-robot cooperative multi-target sampling method |
CN113515130A (en) * | 2021-08-26 | 2021-10-19 | 鲁东大学 | Method and storage medium for agent path planning |
CN113592162A (en) * | 2021-07-22 | 2021-11-02 | 西北工业大学 | Multi-agent reinforcement learning-based multi-underwater unmanned aircraft collaborative search method |
CN113645317A (en) * | 2021-10-15 | 2021-11-12 | 中国科学院自动化研究所 | Loose cluster control method, device, equipment, medium and product |
CN114326749A (en) * | 2022-01-11 | 2022-04-12 | 电子科技大学长三角研究院(衢州) | Deep Q-Learning-based cluster area coverage method |
CN114610024A (en) * | 2022-02-25 | 2022-06-10 | 电子科技大学 | Multi-agent collaborative search energy-saving method used in mountain environment |
CN114764251A (en) * | 2022-05-13 | 2022-07-19 | 电子科技大学 | Energy-saving method for multi-agent collaborative search based on energy consumption model |
CN114815820A (en) * | 2022-04-18 | 2022-07-29 | 电子科技大学 | Intelligent vehicle linear path planning method based on adaptive filtering |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040017313A1 (en) * | 2002-03-12 | 2004-01-29 | Alberto Menache | Motion tracking system and method |
CN110109358A (en) * | 2019-05-17 | 2019-08-09 | 电子科技大学 | A kind of mixing multiple agent cooperative control method based on feedback |
CN110995006A (en) * | 2019-11-28 | 2020-04-10 | 深圳第三代半导体研究院 | Design method of power electronic transformer |
-
2020
- 2020-07-22 CN CN202010710554.6A patent/CN111880564A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040017313A1 (en) * | 2002-03-12 | 2004-01-29 | Alberto Menache | Motion tracking system and method |
CN110109358A (en) * | 2019-05-17 | 2019-08-09 | 电子科技大学 | A kind of mixing multiple agent cooperative control method based on feedback |
CN110995006A (en) * | 2019-11-28 | 2020-04-10 | 深圳第三代半导体研究院 | Design method of power electronic transformer |
Non-Patent Citations (1)
Title |
---|
肖剑: "基于增强学习的Flocking集群协同控制算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113189983B (en) * | 2021-04-13 | 2022-05-31 | 中国人民解放军国防科技大学 | Open scene-oriented multi-robot cooperative multi-target sampling method |
CN113189983A (en) * | 2021-04-13 | 2021-07-30 | 中国人民解放军国防科技大学 | Open scene-oriented multi-robot cooperative multi-target sampling method |
CN113156954A (en) * | 2021-04-25 | 2021-07-23 | 电子科技大学 | Multi-agent cluster obstacle avoidance method based on reinforcement learning |
CN113592162A (en) * | 2021-07-22 | 2021-11-02 | 西北工业大学 | Multi-agent reinforcement learning-based multi-underwater unmanned aircraft collaborative search method |
CN113592162B (en) * | 2021-07-22 | 2023-06-02 | 西北工业大学 | Multi-agent reinforcement learning-based multi-underwater unmanned vehicle collaborative search method |
CN113515130A (en) * | 2021-08-26 | 2021-10-19 | 鲁东大学 | Method and storage medium for agent path planning |
CN113515130B (en) * | 2021-08-26 | 2024-02-02 | 鲁东大学 | Method and storage medium for agent path planning |
CN113645317A (en) * | 2021-10-15 | 2021-11-12 | 中国科学院自动化研究所 | Loose cluster control method, device, equipment, medium and product |
CN113645317B (en) * | 2021-10-15 | 2022-01-18 | 中国科学院自动化研究所 | Loose cluster control method, device, equipment, medium and product |
CN114326749A (en) * | 2022-01-11 | 2022-04-12 | 电子科技大学长三角研究院(衢州) | Deep Q-Learning-based cluster area coverage method |
CN114326749B (en) * | 2022-01-11 | 2023-10-13 | 电子科技大学长三角研究院(衢州) | Deep Q-Learning-based cluster area coverage method |
CN114610024A (en) * | 2022-02-25 | 2022-06-10 | 电子科技大学 | Multi-agent collaborative search energy-saving method used in mountain environment |
CN114610024B (en) * | 2022-02-25 | 2023-06-02 | 电子科技大学 | Multi-agent collaborative searching energy-saving method for mountain land |
CN114815820A (en) * | 2022-04-18 | 2022-07-29 | 电子科技大学 | Intelligent vehicle linear path planning method based on adaptive filtering |
CN114815820B (en) * | 2022-04-18 | 2023-10-03 | 电子科技大学 | Intelligent body trolley linear path planning method based on adaptive filtering |
CN114764251A (en) * | 2022-05-13 | 2022-07-19 | 电子科技大学 | Energy-saving method for multi-agent collaborative search based on energy consumption model |
CN114764251B (en) * | 2022-05-13 | 2023-10-10 | 电子科技大学 | Multi-agent collaborative search energy-saving method based on energy consumption model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111880564A (en) | Multi-agent area searching method based on collaborative reinforcement learning | |
Bayerlein et al. | Trajectory optimization for autonomous flying base station via reinforcement learning | |
CN110502033B (en) | Fixed-wing unmanned aerial vehicle cluster control method based on reinforcement learning | |
CN110347155B (en) | Intelligent vehicle automatic driving control method and system | |
CN110632931A (en) | Mobile robot collision avoidance planning method based on deep reinforcement learning in dynamic environment | |
CN111061277A (en) | Unmanned vehicle global path planning method and device | |
CN111625019B (en) | Trajectory planning method for four-rotor unmanned aerial vehicle suspension air transportation system based on reinforcement learning | |
CN110531786B (en) | Unmanned aerial vehicle maneuvering strategy autonomous generation method based on DQN | |
CN110181508B (en) | Three-dimensional route planning method and system for underwater robot | |
CN110544296A (en) | intelligent planning method for three-dimensional global flight path of unmanned aerial vehicle in environment with uncertain enemy threat | |
CN112230678A (en) | Three-dimensional unmanned aerial vehicle path planning method and planning system based on particle swarm optimization | |
CN110991972A (en) | Cargo transportation system based on multi-agent reinforcement learning | |
Schaal et al. | Assessing the quality of learned local models | |
CN113821041B (en) | Multi-robot collaborative navigation and obstacle avoidance method | |
CN112766499A (en) | Method for realizing autonomous flight of unmanned aerial vehicle through reinforcement learning technology | |
CN113268081A (en) | Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning | |
Bayerlein et al. | Learning to rest: A Q-learning approach to flying base station trajectory design with landing spots | |
CN116451934B (en) | Multi-unmanned aerial vehicle edge calculation path optimization and dependent task scheduling optimization method and system | |
Zhang et al. | Danger-aware adaptive composition of drl agents for self-navigation | |
CN114089776A (en) | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning | |
CN114020024A (en) | Unmanned aerial vehicle path planning method based on Monte Carlo tree search | |
CN114610024B (en) | Multi-agent collaborative searching energy-saving method for mountain land | |
CN116225046A (en) | Unmanned aerial vehicle autonomous path planning method based on deep reinforcement learning under unknown environment | |
Zhu et al. | A novel method combining leader-following control and reinforcement learning for pursuit evasion games of multi-agent systems | |
Zhang et al. | Path planning of patrol robot based on modified grey wolf optimizer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201103 |
|
RJ01 | Rejection of invention patent application after publication |