CN109794937B - Football robot cooperation method based on reinforcement learning - Google Patents

Football robot cooperation method based on reinforcement learning Download PDF

Info

Publication number
CN109794937B
CN109794937B CN201910083609.2A CN201910083609A CN109794937B CN 109794937 B CN109794937 B CN 109794937B CN 201910083609 A CN201910083609 A CN 201910083609A CN 109794937 B CN109794937 B CN 109794937B
Authority
CN
China
Prior art keywords
robot
football
ball
learning
reward
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910083609.2A
Other languages
Chinese (zh)
Other versions
CN109794937A (en
Inventor
胡丽娟
梁志伟
李汉辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201910083609.2A priority Critical patent/CN109794937B/en
Publication of CN109794937A publication Critical patent/CN109794937A/en
Application granted granted Critical
Publication of CN109794937B publication Critical patent/CN109794937B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a football robot cooperation method based on reinforcement learning, which comprises the following steps: s1, constructing a reinforced learning basic model of the football robot based on a Sarsa (lambda) algorithm added with communication, and setting a reward and punishment mechanism r of the reinforced learning basic model; s2, defining a specified number of state variables based on the distance and the angle between the football robots; s3, setting an operable action set of the football robot, and selecting the next action by the football robot based on the reward and punishment mechanism r, the state variables and mutual communication of the football robot; according to the invention, a reward and punishment mechanism is established in the established reinforcement learning basic model, so that the football robot can select the next action according to the current environment and the reward and punishment mechanism, and the football robot can learn and update through the mutual communication, and the cooperation efficiency of the football robot is effectively improved.

Description

Football robot cooperation method based on reinforcement learning
Technical Field
The invention belongs to the field of football robots, and particularly relates to a football robot cooperation method based on reinforcement learning.
Background
The football robot confrontation game is taken as a typical multi-football robot system, provides a good experimental platform for intelligent theoretical research and the integrated application of a plurality of technologies, and has stronger and stronger requirements on the capability of the football robot to autonomously take corresponding measures according to the change of the surrounding environment in the motion process, which relates to a series of research subjects such as robot positioning, path planning, coordination control, target tracking and decision making.
In recent years, many scholars and experts have studied a lot of results, for example, chinese patent application No. 201120008202.2 discloses an intelligent robot game device, which includes a mechanical part and a circuit control part, the mechanical part includes a table, a console and a robot, the circuit control part includes a control module on the control and a controlled module on the robot, can form a confrontational game scene; the chinese patent with application number 201010175496.8 discloses a robot education platform, which comprises a box body, a clothes mechanical assembly, a sensor unit, a control unit, an execution unit, an interface conversion unit, a task software optical disk and a power module, which are arranged in the box body, and is suitable for various experiments of classroom teaching; the Chinese patent with the application number of 200410016867.2 discloses an embedded direct driving device of a football robot, which aims at the defects of the rotating part of the existing autonomous robot and provides a driving device of the football robot, the structure is compact and the debugging is flexible, so that the robot has the functions of quick movement, accurate positioning, impact resistance and strong antagonism; the application number is 201120313058.3's chinese patent discloses an indoor football robot binocular vision navigation, adopts global infrared vision positioning mode, combines sensor information, has realized the indoor football robot binocular vision navigation of indoor mobile robot high accuracy location with navigation, but it is only applicable to the fixed and environment of barrier more stable, the condition of single robot operation. In the prior art, the design of a mechanical structure of a robot platform, the transformation of a robot driving device and the motion control of a fixed environment or a single robot are mainly used, a coordination and cooperation control case which can be applied to a confrontation type match of a football robot is not seen, moreover, in the existing football robot match, the phenomenon that the football robot cannot find the posture of the football robot on a football field and does autorotation motion often occurs, some chances of goal are missed often, and the goal speed is delayed.
Disclosure of Invention
Aiming at the problem of low cooperation efficiency of the football robots in the football robot match in the prior art, the invention provides a football robot cooperation method based on reinforcement learning, which realizes high cooperation efficiency of the football robots through constructing a reinforcement learning basic model of the football robot based on an Sarsa (lambda) algorithm added with communication and communicating the reinforcement learning model and the football robots with each other, and adopts the following specific technical scheme:
a method of reinforcement learning-based soccer robot collaboration, the method comprising:
s1, constructing a reinforced learning basic model of the football robot based on a Sarsa (lambda) algorithm added with communication, and setting a reward and punishment mechanism r of the reinforced learning basic model;
s2, defining a specified number of state variables based on the distance and the angle between the football robots;
s3, setting an operable action set of the football robot, and selecting the next action by the football robot based on the reward and punishment mechanism r, the state variable and the mutual communication of the football robot.
Further, the soccer robot includes an attack end robot and a defense end robot, and the number of state variables is set based on a sum of the attack end robot and the defense end robot.
Further, the method further comprises: and the attacking-end robot or the appointed football robot in the defending-end robot communicates with the rest football robots through the Sarsa (lambda) algorithm, and broadcasts the state and action messages of the attacking-end robot or the defending-end robot through the communication.
Further, the reward and punishment mechanism r is:
Figure BDA0001961023270000031
further, the operational action set includes three types of passing, carrying and shooting.
The invention relates to a football robot cooperation method based on reinforcement learning, which is applied to a football robot match comprising an attack end robot and a defense end robot, wherein for all football robots at an attack end or all football robots at a defense end, a reinforcement learning basic model of the football robot is firstly established based on a Sarsa (lambda) algorithm added with communication, a basic action set and a reward and punishment mechanism of the football robot are established in the reinforcement learning basic model, and a specified number of state variables are set according to the number of the football robots; then, the football robots can select the execution actions in the football match according to the reward and punishment mechanism, the environment of the football robots and the communication information between the football robots and other football robots, so that the cooperation of the football robots is realized; compared with the prior art, the invention can effectively improve the cooperation efficiency of the football robot and improve the appreciation of the football robot competition.
Drawings
FIG. 1 is a block flow diagram of a cooperation method of a soccer robot based on reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a basic reinforcement learning model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of state variables of players in an embodiment employing the method of the present invention;
FIG. 4 is a schematic diagram of a simulation experiment on an HFO platform using the method of the present invention;
FIGS. 5(a) and 5(b) are graphs showing the comparison of the experimental results of the cooperative efficiency of the soccer robot with and without communication in the embodiment of the present invention;
FIG. 6 is a comparison graph showing the learning performance of the soccer robot according to the present invention;
fig. 7 is a comparison graph showing the learning performance of the intercommunication between different soccer robots according to the embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
Example one
Referring to fig. 1, in an embodiment of the present invention, a soccer robot cooperation method based on reinforcement learning is provided, which specifically includes:
s1, constructing a reinforced learning basic model of the football robot based on the Sarsa (lambda) algorithm added with communication, and setting a reward and punishment mechanism r of the reinforced learning basic model.
Referring to fig. 2, it can be seen that the principle of the reinforcement learning basic model is as follows: the football robot selects actions under the condition of sensing the current environment, the environment state is transferred to a new state, correspondingly, the new state generates a strengthening signal to be fed back to the football robot, and the football robot determines the next action according to the current environment information and the strengthening signal; the key points of the football robot reinforcement learning in the invention comprise:
strategy: a key component of reinforcement learning agents that provides a mapping of control of euphoria to environmental perception states; value function: also known as a reward value. Evaluating the behavior derived using the existing policy and estimating the performance of the current state, which is a reaction to taking the behavior according to the current policy; the value function corrects the strategy through continuous correction; reward and punishment value: the device is used for estimating the instantaneous expectation of the environment perception state generated by the one-time control action, namely the action of the football robot in a certain state can obtain a corresponding reward and punishment value, and when the expectation is met, a positive reward and punishment value is given, and when the expectation is not met, a negative reward and punishment value is given; an environment model: a planning tool for predicting future behavior scenarios in view of future possibilities.
In the embodiment of the invention, in the learning process of the reinforced learning basic model, the football robot can continuously try to select actions, and the reinforced signals provided by the environment evaluate the action quality rather than transmit the information of how to generate correct actions to the system; meanwhile, because the information of the external environment adjusting action is little, the reinforcement learning system of the soccer robot must learn by depending on the experience of the robot, and finally the soccer robot can obtain the optimal strategy through the evaluation value of the reinforcement signal adjusting action, namely how to cooperate to achieve the goal of scoring.
The Sarsa (lambda) algorithm adopted by the invention is a variation of the Sarsa algorithm, wherein the specific working principle of the Sarsa algorithm is as follows: firstly, the name of the Sarsa algorithm comes from updating a Q value by applying experience of State (State) → Action (Action) → Reward and punishment (Reward) → State (State) → Action (Action)', wherein the Q value is a value of a strategy to be executed; the form of Sarsa's experience is (s, a, s ', a '), meaning: the Agent performs action a in the current state s, accepts the reward penalty value r, ends at state s ', and thus determines that action a' is performed, and the experience (s, a, s ', a') of Sarsa provides a new value for updating Q (s, a), i.e. r + γ Q (s ', a'); since the Sarsa (λ) algorithm is a variation of the Sarsa algorithm, which provides that for each state s and action a, updating Q (s, a) each time a new reward or punishment is received, but only those qualifying as being greater than a certain threshold, is not only more efficient, but also suffers little from loss of accuracy, the specific principle of the Sarsa (λ) algorithm is:
Sarsa(λ,S,A,γ,α)
inputting:
s is a set of states, A is a set of actions, γ is a discount rate, α is a step size, and λ is an attenuation rate
Internal state:
real value arrays Q (s, a) and e (s, a), previous state s, previous behavior a
begin:
Random initialization Q (s, a)
For all s, a, the initialization e (s, a) is 0
Observing the current state s
Selecting a using a Q-based strategy
repeat forever:
Performing action a
Observe reward punishment r and state s'
Selecting action a 'with one Q-based policy'
δ←r+γQ(s',a')-Q(s,a)
e(s,a)←e(s,a)+1
Fall all s",a"
Q(s",a")←Q(s",a")+αδe(s",a")
e(s",a")←γλe(s",a")
s←s′
a←a′
end-repeat
End
Where e (s, a) is also called the eligibility trace, where s and a are the set of all states and all actions, respectively; after each action is performed, the Q value of each "state-action" pair is updated.
Preferably, the reward and punishment mechanism r of the present invention is:
Figure BDA0001961023270000071
in the invention, the goal is that an attacking player scores a goal, so that the reward and punishment value r after the goal is set to be 1, and other actions give out corresponding small reward and punishment values r; experiments prove that successful ball passing can also give a small prizeA penalty value r (e.g. 0.01), where r ═ 0 is equally valid; and no discount is used because the task happens by chance.
S2, defining a specified number of state variables based on the distance and the angle between the football robots;
in the embodiment of the invention, as the method is applied to the football robot competition, the football robot comprises an attack end robot and a defense end robot, and the number of the state variables is set based on the sum of the attack end robot and the defense end robot; referring specifically to fig. 3, in the present embodiment, the attacking player is white, the defending player is black, and the player index number is set to be O according to the distance from the ball1,O1Nearest to the sphere, followed by O2By analogy to Om(ii) a Similarly, the defensive player is respectively set to D according to the distance to the ball1,D2,…,Dn(ii) a And the goalkeeper can be any one of defensive players and uses DgIt is shown that for a soccer robot game with four attacking players and five defending players, the present invention describes the positional relationship of the soccer robot by using the following 17 state variables: dist (O)1,O2),dist(O1,O3),dist(O1,O4) Indicates the person attacking the ball O1Distance from each teammate; dist (O)1,Dg) Indicates a ball holder O1Distance from the goalkeeper; dist (O)1,GL),dist(O2,GL),dist(O3,GL),dist(O4GL) represents the distance of each offensive player from goal line GL; min _ dist (O)1,D),min_dist(O2,D),min_dist(O3,D),min_dist(O4D) represents the closest distance of each offensive player from a defensive player; min _ ang (O)2,O1,D),min_ang(O3,O1,D),min_ang(O4,O1And D) represents the minimum angle OiO1D, D is all defending players; min _ dist (O)1,Ddcone) Shown with a cone DdconeBall holder O1Distance to defensive player, DdconeIs one of O1The vertex is a half angle of 60 degrees, and the axis of the ball passes through the cone of the goal; max _ good _ ang (O)1) Represents the maximum angle max (angle GP)Left side ofO1Dg,∠GPRight sideO1Dg) I.e. with holding the ball O1Is a vertex, O1To a goalkeeper as a ray, O1To both side goal posts GPLeft side of,GPRight sideIs the maximum angle of ray composition; wherein dist (O)1,GL),max_goal_ang(O1),dist(O1,Dg) Directly influencing the selection of the shooting action, min _ dist (O)1,GL),min_dist(O1,Ddcone) Directly influences the selection of the action with the ball, and other state variables influence the selection of the action of passing the ball.
In the embodiment, only the attack efficiency of the attacking player is considered, so that the number of the state variables is linearly related to the number of the attacking player and linearly unrelated to the number of the defending players; of course, the number of state variables of defensive players is the same as the linear relationship of the state variables of offensive and offensive players.
And S3, setting an operable action set of the football robot, and selecting the next action by the football robot based on a reward and punishment mechanism r and the mutual communication of the state variables and the football robot.
The operable action set comprises three types of pass, ball carrying and shooting, wherein the pass action PassK is based on the distance from teammates and is not an actual number; PassK kicks the ball to the kth teammate, K2, 3, …, m. Drible is a ball-carrying action to encourage attackers to approach the goal; shooting the shot to the goal, and scoring the goal; when the ball is not held by the attacking player, the attacking player closest to the ball directly rushes to the ball (Getball) to achieve the right of holding the ball; meanwhile, other attackers always keep the lineup forward attack (GetOpen), the pseudo code is as follows:
if has the ball holding power then
Set of actions to perform { Pass2, …, Passm, Dribble, shooot }
The closest offensive player then to the ball in else if
GetBull (near ball)
else
GetOpen (move to lattice point).
In the embodiment of the invention, a designated football robot in an attack end robot or a defense end robot is communicated with the rest football robots through a Sarsa (lambda) algorithm, and the self state and action message is broadcasted through communication; for example, when a player selects an action in state s and accepts a reward or punishment r, a message is broadcast to the team, and the specific implementation can be realized by the following pseudo code:
reinforcement learning for communication
Initialization:
for all training fragment do
s←NULL
repeat
if has the ball holding power then
s ← getCurrentStateFromEnviroment (obtaining the status of the current context)
Selecting and executing action a according to Q function
r ← waitForRewardFromEnviroment (waiting for environment judgment action to give corresponding reward and punishment value)
Broadcast message (s, a, r)
The closest offensive player then to the ball in else if
GetBull (near ball)
else
GetOpen (Mobile to lattice type dot)
if receives the broadcast message(s)m,am,rm)then
if state s is empty then
s,a,r←sm,am,rm
else
s′,a′,r′←sm,am,rm
Q(s,a)←Q(s,a)+α(r+γQ(s′,a′)-Q(s,a))
s,a,r←s′,a′,r′
The unitil fragment ends.
The above segments are the learning tasks of the football robot in the reinforcement learning basic model; the following three cases are defined here as the end of a segment: scoring when the goal is played, crossing the boundary of the goal and getting the ball control right (including goalkeeper) by the defenders; each football robot stores a current action value function, an attacking player holding a football carries out action and receives reward punishment, then a message is broadcasted to a team as (s, a, r), each football robot is initialized with (s, a, r) at the beginning, and subsequent messages are dynamically updated according to (s ', a ', r '); meanwhile, in order to ensure the consistency of the messages, the method is also provided with a special football robot which is used as a medium for the communication of all the football robots, namely, the communication information among all the football robots is firstly sent to the special football robot and then is broadcasted to other football robots by the special football robot; in the method of the invention, the special football robot is used as the intermediate communication medium, and the special football robot is independent from other football robots in communication, so that the integrity and the reliability of the communication information can be realized.
Example two
The method of the first embodiment is verified by using an HFO experimental platform, and specifically comprises m attacking players and n defending players. Wherein the defender comprises a goalkeeper, and n is more than or equal to m. A half-court attack task is performed on half of the football field and begins near the half-court line, with the ball held by the attacking player; referring to FIG. 4, there is illustrated a classic 4v5 mode HFO platform wherein the white filled circle is a ball, four offensive players, five defensive players including a goalkeeper; in the experimental process, in order to successfully shoot and goal the shot in the HFO experimental platform, the attacking player needs to learn the three operations of passing the shot, carrying the shot and shooting the shot through a reinforcement learning basic model, and simulate the defending player to try to prevent the action of the attacking player.
Preferably, the invention firstly carries out 30 groups of experiments to the learning with and without communication between the soccer robots respectively to analyze errors; the specific number of experimental data sets can be described according to practical situations, which are only preferred embodiments of the present invention and are not limitations and fixed on the method of the present invention;referring to FIGS. 5(a) and 5(b), the x-axis represents the number of experimental groups, the y-axis represents the score y obtained after 20000 segment learning, and the score y is expressed by the formula
Figure BDA0001961023270000121
Calculating to obtain; wherein r isjIs a reward and punishment value obtained by the agent learning when the jth segment is finished; FIG. 5(a) shows the scores obtained in each set of experiments with communicating studies, where the dashed line is the average of 30 sets of studies
Figure BDA0001961023270000122
And the variance is only 0.0005 calculated by a variance formula; FIG. 5b shows the scores obtained in each set of experiments with learning without communication, wherein the dotted line represents the average of 30 sets of learning
Figure BDA0001961023270000124
And the variance is only calculated to be 0.0025, and the scores obtained by two groups of learning can be represented by respective average values, and the error is ignored within an allowable range.
Referring to FIG. 6, which shows the performance comparison of reinforcement learning algorithm with and without communication in the soccer robot, the x-axis represents the number of segments, i.e., xiIndicating that the agent learns in the ith segment, where i e 1,20000]. The y-axis represents the score y obtained at the end of each segment learning taskiFraction yiBy the formula
Figure BDA0001961023270000123
Calculating to obtain; wherein r isjIs the reward penalty value at the end of the jth segment; it can be seen from the figure that in the first 5000 pieces of learning, the performance of both the communicating soccer robot and the non-communicating soccer robot increases linearly, and the learning added to the communication increases more rapidly; after 5000 pieces of study, the efficiency of study of communication is obviously increased; after 20000 segments, the two learning curves both tend to converge basically, the learning success rate without communication is 20.09%, and the learning success rate after communication is about 31.08%, which is higher than that without communicationThe learning efficiency is improved by 10.99%; the comparison shows that the learning efficiency of the football robot can be improved after communication is added.
In the embodiment of the invention, in order to eliminate the hidden state, the cause of the football robot is set to 360 degrees; meanwhile, in order to more clearly compare the performance of the algorithm of reinforcement learning after communication is added, communication learning is added to different numbers of attacking players for comparison experiments, specifically, referring to fig. 7, a football robot system containing four players, three players, two players and a single player is respectively learned and updated, and as can be seen from the fact that the four curves are basically linearly increased in a certain learning segment and tend to be converged; in each learning segment, the values of the learning curves of all players after communication are always higher than those of other curves; when 5000 and 10000 segments are used for learning, the learning efficiency is accelerated along with the increase of the number of players for communication learning; after 20000 segments, the players are in communication learning; the learning score of the football robot system containing at most four players is far higher than that of other football robot systems containing a small number of football robots; the comparison of the data shows that in the method, when the number of the football robots in the football robot system is larger, the learning efficiency of the football robots is higher through the communication learning among the football robots, namely, in the actual match process, the cooperation efficiency of the whole football robot system can be effectively improved through the method, and therefore the whole attack efficiency is improved.
In summary, the invention is applied to a football robot cooperation method based on reinforcement learning in a football robot match comprising an attack end robot and a defense end robot, and for all football robots at the attack end or all football robots at the defense end, a reinforcement learning basic model of the football robot is firstly established based on a Sarsa (lambda) algorithm added with communication, a basic action set and a reward and punishment mechanism of the football robot are established in the reinforcement learning basic model, and a specified number of state variables are set according to the number of the football robots; then, the football robots can select the execution actions in the football match according to the reward and punishment mechanism, the environment of the football robots and the communication information between the football robots and other football robots, so that the cooperation of the football robots is realized; compared with the prior art, the invention can effectively improve the cooperation efficiency of the football robot and improve the appreciation of the football robot competition.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing detailed description, or equivalent changes may be made in some of the features of the embodiments described above. All equivalent structures made by using the contents of the specification and the attached drawings of the invention can be directly or indirectly applied to other related technical fields, and are also within the protection scope of the patent of the invention.

Claims (4)

1. A football robot cooperation method based on reinforcement learning is characterized in that the method comprises the following steps:
s1, constructing a reinforced learning basic model of the football robot based on a Sarsa (lambda) algorithm added with communication, and setting a reward and punishment mechanism r of the reinforced learning basic model; the principle of the reinforcement learning basic model is that the football robot selects actions under the condition of sensing the current environment, the environment state is transferred to a new state, correspondingly, the new state generates a reinforcement signal to be fed back to the football robot, and the football robot determines the next action according to the current environment information and the reinforcement signal;
s2, defining a specified number of state variables based on the distance and the angle between the football robots; the distance between an offensive ball holder and each teammate, the distance between the ball holder and a goalkeeper, the distance between each offensive ball holder and a goal line, the closest distance between each offensive ball holder and a defender, the minimum angle, the closest distance between the ball holder and the defender in a cone with the ball and the maximum angle are included;
s3, setting an operable action set of the football robot, and selecting the next action by the football robot based on the reward and punishment mechanism r, the state variables and mutual communication of the football robot; the operational set of actions includes three types of pass, dribbling, and shooting, wherein a pass action PassK kicks a ball to a kth teammate based on a distance PassK from the teammate; drible is a ball-carrying action to encourage attackers to approach the goal; shooting the shot to the goal, scoring the goal; when the ball is not held by an attacker, the attacker closest to the ball can directly rush to the ball to achieve the right of holding the ball; at the same time, other attacking players always keep the formation forward attack.
2. The learning-reinforcement-based soccer robot collaboration method of claim 1, wherein the soccer robots include an attack end robot and a defense end robot, the number of state variables being set based on a sum of the attack end robot and the defense end robot.
3. The reinforcement learning-based soccer robot collaboration method of claim 2, wherein the method further comprises: and the attacking-end robot or the appointed football robot in the defending-end robot communicates with the rest football robots through the Sarsa (lambda) algorithm, and broadcasts the state and action messages of the attacking-end robot or the defending-end robot through the communication.
4. The learning-reinforcement-based soccer robot collaboration method of claim 1, wherein the reward and punishment mechanism r is:
Figure DEST_PATH_IMAGE001
CN201910083609.2A 2019-01-29 2019-01-29 Football robot cooperation method based on reinforcement learning Active CN109794937B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910083609.2A CN109794937B (en) 2019-01-29 2019-01-29 Football robot cooperation method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910083609.2A CN109794937B (en) 2019-01-29 2019-01-29 Football robot cooperation method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN109794937A CN109794937A (en) 2019-05-24
CN109794937B true CN109794937B (en) 2021-10-01

Family

ID=66559083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910083609.2A Active CN109794937B (en) 2019-01-29 2019-01-29 Football robot cooperation method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN109794937B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110280019A (en) * 2019-06-21 2019-09-27 南京邮电大学 Soccer robot Defending Policy based on intensified learning
CN110370295B (en) * 2019-07-02 2020-12-18 浙江大学 Small-sized football robot active control ball suction method based on deep reinforcement learning
CN111136659B (en) * 2020-01-15 2022-06-21 南京大学 Mechanical arm action learning method and system based on third person scale imitation learning
CN111781922B (en) * 2020-06-15 2021-10-26 中山大学 Multi-robot collaborative navigation method based on deep reinforcement learning
CN112008734B (en) * 2020-08-13 2021-10-15 中山大学 Robot control method and device based on component interaction degree
CN113312840B (en) * 2021-05-25 2023-02-17 广州深灵科技有限公司 Badminton playing method and system based on reinforcement learning
CN113467481B (en) * 2021-08-11 2022-10-25 哈尔滨工程大学 Path planning method based on improved Sarsa algorithm

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001191286A (en) * 1999-10-30 2001-07-17 Korea Advanced Inst Of Sci Technol Soccer robot controlling system using ir module
CN1394660A (en) * 2002-08-06 2003-02-05 哈尔滨工业大学 Full-automatic football robot and its intelligent control system
CN104063541A (en) * 2014-06-18 2014-09-24 南京邮电大学 Hierarchical decision making mechanism-based multirobot cooperation method
WO2014151926A2 (en) * 2013-03-15 2014-09-25 Brain Corporation Robotic training apparatus and methods
CN104865960A (en) * 2015-04-29 2015-08-26 山东师范大学 Multi-intelligent-body formation control method based on plane
CN106964145A (en) * 2017-03-28 2017-07-21 南京邮电大学 A kind of apery Soccer robot pass control method and team's ball-handling method
CN207198660U (en) * 2017-08-31 2018-04-06 安徽朗巴智能科技有限公司 The intelligence control system that novel football robot is independently shot
US9950421B2 (en) * 2010-07-02 2018-04-24 Softbank Robotics Europe Humanoid game-playing robot, method and system for using said robot
CN108563112A (en) * 2018-03-30 2018-09-21 南京邮电大学 Control method for emulating Soccer robot ball-handling

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001191286A (en) * 1999-10-30 2001-07-17 Korea Advanced Inst Of Sci Technol Soccer robot controlling system using ir module
CN1394660A (en) * 2002-08-06 2003-02-05 哈尔滨工业大学 Full-automatic football robot and its intelligent control system
US9950421B2 (en) * 2010-07-02 2018-04-24 Softbank Robotics Europe Humanoid game-playing robot, method and system for using said robot
WO2014151926A2 (en) * 2013-03-15 2014-09-25 Brain Corporation Robotic training apparatus and methods
CN104063541A (en) * 2014-06-18 2014-09-24 南京邮电大学 Hierarchical decision making mechanism-based multirobot cooperation method
CN104865960A (en) * 2015-04-29 2015-08-26 山东师范大学 Multi-intelligent-body formation control method based on plane
CN106964145A (en) * 2017-03-28 2017-07-21 南京邮电大学 A kind of apery Soccer robot pass control method and team's ball-handling method
CN207198660U (en) * 2017-08-31 2018-04-06 安徽朗巴智能科技有限公司 The intelligence control system that novel football robot is independently shot
CN108563112A (en) * 2018-03-30 2018-09-21 南京邮电大学 Control method for emulating Soccer robot ball-handling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RoboCup3D仿真中足球机器人的全向行走于团队协作;沈萍;《中国硕士学位论文全文数据库 信息科技辑》;20160515;全文 *

Also Published As

Publication number Publication date
CN109794937A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
CN109794937B (en) Football robot cooperation method based on reinforcement learning
Browning et al. STP: Skills, tactics, and plays for multi-robot control in adversarial environments
CN106352738A (en) Multi-missile cooperative guidance method based on output consistency
CN110928329A (en) Multi-aircraft track planning method based on deep Q learning algorithm
CN114063644B (en) Unmanned fighter plane air combat autonomous decision-making method based on pigeon flock reverse countermeasure learning
CN107803025A (en) Analogy method is aimed at and triggered during a kind of 3D high-precision reals
Xiang et al. Research on UAV swarm confrontation task based on MADDPG algorithm
Schwab et al. Learning skills for small size league robocup
Zhu et al. Learning primitive skills for mobile robots
Liu et al. Comparing heuristic search methods for finding effective group behaviors in RTS game
CN113741186A (en) Double-machine air combat decision method based on near-end strategy optimization
Shi et al. Research on self-adaptive decision-making mechanism for competition strategies in robot soccer
Vicerra et al. A multiple level MIMO fuzzy logic based intelligence for multiple agent cooperative robot system
Reis et al. Coordination in multi-robot systems: Applications in robotic soccer
Gorman et al. Imitative learning of combat behaviours in first-person computer games
CN107315349B (en) Ball hitting motion control method of robot
CN110280019A (en) Soccer robot Defending Policy based on intensified learning
CN104460668A (en) Method for improving soccer robot shooting efficiency
CN110711368B (en) Ball hitting method and device of table tennis robot
CN114757092A (en) System and method for training multi-agent cooperative communication strategy based on teammate perception
CN113377099A (en) Robot pursuit game method based on deep reinforcement learning
Chen et al. Commander-Soldiers Reinforcement Learning for Cooperative Multi-Agent Systems
Kober et al. Learning prioritized control of motor primitives
Li Design and implement of soccer player AI training system using unity ML-agents
He The Design of a Soccer Robot Game Strategy Based on Fuzzy Decision Algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant