WO2023044676A1

WO2023044676A1 - Control method for multiple robots working cooperatively, system and robot

Info

Publication number: WO2023044676A1
Application number: PCT/CN2021/119981
Authority: WO
Inventors: 杜峰; 吴剑强; 李韬
Original assignee: 西门子（中国）有限公司
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2023-03-30

Abstract

Disclosed are a control method for multiple robots working cooperatively, a system and a robot. An operation control method comprises: receiving scene information captured from each robot, the scene information comprising local robot scene information and other robot scene information (S1); calculating the scene information using a control algorithm, to obtain an action command corresponding to each robot (S2); and sending the corresponding action command to each robot, so that each robot executes the corresponding action command, and at least one common task is completed cooperatively (S3). In this way, multiple robots can work together to complete a common task without real-time communication between the robots.

Description

A control method, system and robot for multiple robots working together

technical field

The embodiments of the present application relate to the technical field of industrial control, and in particular to a control method, system and robot for a plurality of robots working together.

Background technique

With the development of industrial control, more and more tasks are completed by robots (including robot arms, collectively referred to as robots). Controlling robots to achieve various task scenarios has become the key to the development of future industrial control technology. However, the existing robot control usually needs to complete the corresponding tasks according to the preset program. When multiple robots are required to work together to complete a common task, a large amount of programming work is often required, which brings great difficulties to the completion of the task. And because the robot is expensive, when multiple robots work together to complete a common task, even if a pre-set program is used, real-time communication between multiple robots is still required to avoid robot damage caused by collisions between robots during work.

For example, referring to FIG. 1, when two robots work together to complete a common task, the first robot 1 performs handling operations on the third object 103, the fourth object 104, the fifth object 105, and the sixth object 106, and the second robot 1 2 Carrying the first object 101, the second object 102, and the seventh object 107. In order to avoid the collision between the first robot 1 and the second robot 2 in the collision area A, real-time communication between the first robot 1 and the second robot 2 is required, which increases the control cost of multiple robots working together.

Contents of the invention

In view of this, the embodiment of the present application provides a control scheme for a plurality of robots working together, which can realize a plurality of robots working together to complete a common task without real-time communication between the robots.

According to the first aspect of the embodiments of the present application, a method for controlling the operation of multiple robots is provided, including: accepting scene information captured by each robot, the scene information including: local robot scene information and other robot scene information; using The control algorithm calculates the scene information to obtain the corresponding action command of each robot; sends the corresponding action command to each robot, so that each robot executes the corresponding action command and cooperates to complete at least one common task.

According to the operation control scheme of multiple robots provided in the embodiment of the present application, each robot captures scene information separately, and each robot uses a control algorithm to perform calculations based on the scene information to obtain the corresponding action commands of each robot, so that each robot executes the corresponding action command , working together to accomplish at least one common task. In the embodiments of the present application, the robots realize cooperative work by capturing scene information and control algorithms, and avoid collisions between the robots without real-time communication between the robots. The embodiment of the present application reduces the control cost of multiple robots working together, is easy to maintain and upgrade, and is suitable for handling various common tasks.

In some embodiments of the present application, the scene information includes: at least one of robot running image, robot running force, robot running distance, and robot running angle.

In such a manner, it is possible to accurately know the state of the robot, and further realize better and more accurate control of each robot to work.

In some embodiments of the present application, the action command includes: robot movement rotation angle and/or robot movement torque.

In this way, the robot can be precisely controlled to complete the task.

In some embodiments of the present application, the scene information is captured by at least one sensor installed on each robot.

In this way, the scene information is captured by the sensor, which facilitates the analysis of the states of the robots working together, so as to control the robots to work.

In some embodiments of the present application, the use of a control algorithm to calculate the scene information to obtain the action commands corresponding to each robot includes:

obtaining a common task, and parsing the common task into a time series-based action set and state set;

Obtaining the state of the local robot and the state of other robots according to the scene information;

A control algorithm is used to calculate the state of the local robot, the states of other robots, and the time-series-based action set and state set to obtain action commands corresponding to each robot.

In this way, the embodiment of the present application not only obtains the state of the local robot through the scene information, but also obtains the state of other robots, so as to avoid collisions between the local robot operating under the action command and other robots operating under the action command.

In some embodiments of the present application, the control algorithm is learned and obtained according to a deep reinforcement learning neural network model.

In this way, native robot action commands that more satisfy common tasks can be generated, and the deep reinforcement learning neural network model can be gradually refined according to the selection of training samples.

In some embodiments of the present application, the deep reinforcement learning neural network model is obtained by training in a virtual environment with various robots; and/or, the deep reinforcement learning neural network model uses multiple robots to perform various Obtained from common mission training.

in this way,

In some embodiments of the present application, the deep reinforcement learning neural network model is obtained by training with mixed data of virtual data obtained in a virtual environment combined with actual data tested by a single robot on site.

In this manner, the embodiment of the present application further improves the training effect of the deep reinforcement learning neural network model through various training samples.

In some embodiments of the present application, the deep reinforcement learning neural network model is trained and obtained by using actual data of a plurality of robots tested in the field.

In some embodiments of the present application, the deep reinforcement learning neural network model is a continuous or discrete function.

In this way, the scope of application of the deep reinforcement learning neural network model is expanded, so that it can improve the performance of more common tasks.

In some embodiments of the present application, the deep reinforcement learning neural network model adopts an asynchronous structure during training.

In this way, the complexity of the application of the deep reinforcement learning neural network model is reduced, making the embodiment of the present application easier to implement.

According to the second aspect of the embodiment of the present application, there is provided an operation control system for a plurality of robots, including: a plurality of robots, and each robot respectively captures scene information, and the scene information includes: local robot scene information and other robot scene information ; Using a control algorithm to calculate the scene information to obtain the action commands corresponding to each robot; the robots execute the corresponding action commands and work together to complete at least one common task.

According to the third aspect of the embodiment of the present application, a robot is provided, the robot sends the captured scene information to the controller, the scene information includes: local robot scene information and other robot scene information; the controller adopts The control algorithm calculates the scene information to obtain the action commands corresponding to each robot; the robot executes the corresponding action commands and cooperates with other robots to complete at least one common task.

According to a third aspect of the embodiments of the present application, there is provided a computer program product, which is tangibly stored on a readable medium of a controller, and computer-executable instructions, which when executed cause at least A processor executes any one of the above methods.

According to a fourth aspect of the embodiments of the present application, there is provided a computer-readable medium, on which computer-executable instructions are stored, and when executed, the computer-executable instructions cause at least one processor to execute any one of the above-mentioned methods.

According to the operation control scheme of multiple robots provided in the embodiment of the present application, the controller receives the scene information captured by each robot, uses the control algorithm to calculate according to the scene information, and sends corresponding action commands to each robot, so that each robot executes The corresponding action commands work together to complete at least one common task. In the embodiments of the present application, the robots realize cooperative work by capturing scene information and control algorithms, and avoid collisions between the robots without real-time communication between the robots. The embodiment of the present application reduces the control cost of multiple robots working together, is easy to maintain and upgrade, and is suitable for handling various common tasks.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in the embodiments of the present application, and those skilled in the art can also obtain other drawings based on these drawings.

FIG. 1 is a schematic diagram of a system in which a plurality of robots work together according to an embodiment of the present application;

FIG. 2 is a flow chart of the steps of a method for a plurality of robots to work together according to an embodiment of the present application;

FIG. 3 is a flow chart of step S2 of a method for a plurality of robots to work together according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a deep reinforcement learning neural network model of an embodiment of the present application;

FIG. 5 is a schematic diagram of the actual training situation of the deep reinforcement learning neural network model of the embodiment of the present application.

reference sign

1: The first robot

2: Second robot

101: First Object

102: Second object

103: The Third Object

104: The fourth object

105: Fifth Object

106: Sixth Object

107: Seventh Object

S1: Each robot captures scene information separately, and the scene information includes: local robot scene information and other robot scene information

S2: Use the control algorithm to calculate the scene information, and obtain the action commands corresponding to each robot

S21: Obtain the common task and parse the common task into a time series-based action set and state set

S22: Obtain the local robot state and other robot states according to the scene information

S23: Use the control algorithm to calculate the state of the local robot, the state of other robots, and the action set and state set based on time series, and obtain the corresponding action commands of each robot

S3: Each robot executes the corresponding action command, and works together to complete at least one common task

M: Deep Reinforcement Learning Neural Network Model

s: state of the input robot (local robot and other robot)

a: Generate native robot motion commands

r: feedback information

s': new native robot state

a': new native robot action command

R ₁ ,…,R _N : N robots

M ₁ ,...,M _N : N deep reinforcement learning neural network models

D ₁ ,..., D _N : N local robot scene information and other robot scene information

501: common task

502: Task analysis module

503: Action Set and State Set Based on Time Series

A ₁ ，…,A _N : N action commands

Detailed ways

In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present application, the following will clearly and completely describe the technical solutions in the embodiments of the present application in conjunction with the drawings in the embodiments of the present application. Obviously, the described The embodiments are only some of the embodiments of the present application, but not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in the embodiments of the present application shall fall within the protection scope of the embodiments of the present application.

In the field of industrial control, in order to ensure low latency of data transmission, a controller is usually installed for each robot, and the control operation of the robot is realized based on the local controller of each robot. For some complex industrial control scenarios, multiple robots need to work together to complete common tasks, and each robot needs to communicate in real time to avoid robot damage caused by collision with each other. In this application, each robot captures scene information separately, and uses a control algorithm to generate action commands corresponding to each robot based on the scene information to control multiple robots to work together without real-time communication between robots.

The specific implementation of the embodiment of the present application will be further described below in conjunction with the accompanying drawings of the embodiment of the present application.

Referring to Figure 2, the embodiment of the present application provides a control method for multiple robots to work together, including:

Step S1. Accept scene information captured from each robot, the scene information includes: local robot scene information and other robot scene information.

In some specific embodiments of the present application, each robot captures scene information through at least one sensor.

Specifically, the sensor may be a camera or a laser sensor, and at least one sensor is installed at any position of the robot that is convenient for capturing scene information.

In the embodiment of the present application, the sensor captures the scene information, which is convenient for analyzing the state of each robot working in cooperation, so as to control each robot to work.

In this embodiment of the present application, the scene information can be captured at preset intervals, or can be captured continuously. The specific capture method can be selected and set according to the needs of the common task completed by the collaborative work.

In some specific embodiments of the present application, the scene information includes: at least one of robot running image, robot running force, robot running distance, and robot running angle.

In the embodiment of the present application, by capturing at least one of the robot running image, the robot running force, the robot running distance, and the robot running angle, it is possible to accurately know the state of the robot, and further realize better and more accurate control of each robot to work. .

Step S2, using the control algorithm to calculate the scene information, and obtain the action commands corresponding to each robot.

In some specific embodiments of the present application, the motion command includes: robot movement rotation angle and/or robot movement torque, and the like.

The embodiment of the present application can accurately control the robot to complete the task by controlling the rotation angle of the robot movement and/or the movement torque of the robot.

Exemplarily, when the robot completes the carrying task, it needs to carry the object to the target position by adjusting the rotation angle of the robot movement and the movement torque of the robot.

In some specific embodiments of the present application, referring to FIG. 3, step S2 includes:

S21. Obtain a common task, and parse the common task into an action set and a state set based on time series.

Specifically, the common tasks are usually analyzed by the task analysis module of the robot controller to obtain action sets and state sets based on time series, and the time interval of the specific time series is set according to the needs of the tasks.

In order to ensure low latency of data transmission, a controller is usually installed for each robot, and the control operation of the robot is realized based on the local controller of each robot.

Exemplarily, the action set includes: during the time period T1, the robot performs action one, and during the time period T2, the robot performs action two.

Exemplarily, the state set includes: at time T3, the robot is in state one, and at time T4, the robot is in state two.

S22. Obtain the status of the local robot and the status of other robots according to the scene information.

In the embodiment of the present application, not only the status of the local robot is obtained through the scene information, but also the status of other robots can be obtained, so that collisions between the local robot operating under the action command and other robots operating under the action command can be avoided.

S23. Using the control algorithm to calculate the state of the local robot, the states of other robots, and the action set and state set based on time series, and obtain the action commands corresponding to each robot.

In the embodiment of the present application, common tasks are parsed into action sets and state sets based on time series, and then local robot states and other robot states are obtained according to scene information. For the local robot, the local robot state and other robot states are compared with the time series-based action set and state set to obtain the action commands that the local robot needs to execute. Therefore, in the embodiment of the present application, the action commands corresponding to each robot are obtained by combining the local robot state and other robot states with time-series-based action sets and state sets, so as to avoid collisions when the robots operate under the control of action commands.

In some specific embodiments of the present application, the control algorithm is learned and obtained according to a deep reinforcement learning neural network model.

Deep Reinforcement Learning (DRL) is a branch that has developed rapidly in the field of deep learning in the past two years. Its purpose is to solve the problem of computer perception to decision-making control, so as to realize general artificial intelligence.

Referring to Fig. 4, the deep reinforcement learning neural network model M of the embodiment of the present application generates the local robot action command a according to the input state s of the robot (local robot and other robots), and operates the local robot under the action command The result, that is, the feedback information r is sent to the deep reinforcement learning neural network model, and then the deep reinforcement learning neural network model generates a new local robot action command a' according to the feedback information r and the new local robot state s'. After such repeated training, the deep reinforcement learning neural network model M can generate local robot action commands that are more suitable for common tasks, and can gradually improve the deep reinforcement learning neural network model M according to the selection of training samples.

In some specific embodiments of the present application, the deep reinforcement learning neural network model is obtained by using various robots to train in a virtual environment; and/or, multiple robots are trained to perform various common tasks in a virtual environment.

The embodiment of the present application uses the data obtained by training various robots in the virtual environment as the training samples of the deep reinforcement learning neural network model, and the embodiment of the present application can also use the data obtained by multiple robots performing various common task training in the virtual environment As a training sample for a deep reinforcement learning neural network model. The embodiment of the present application further improves the training effect of the deep reinforcement learning neural network model through various training samples.

In some specific embodiments of the present application, the deep reinforcement learning neural network model is obtained by training with mixed data of virtual data obtained in a virtual environment combined with actual data tested by a single robot in the field.

The embodiment of the present application uses the mixed data obtained in the virtual environment combined with the actual data of the on-site test as the training sample of the deep reinforcement learning neural network model, which further improves the training effect of the deep reinforcement learning neural network model.

In some specific embodiments of the present application, the deep reinforcement learning neural network model is trained and obtained by using actual data of multiple robots tested in the field.

The embodiment of the present application uses the actual data of multiple robots tested on site as the training samples for the deep reinforcement learning neural network model. The embodiment of the present application further improves the training effect of the deep reinforcement learning neural network model through various training samples.

In some specific embodiments of the present application, in order to expand the application range of the deep reinforcement learning neural network model so that it can improve the performance of more common tasks, the deep reinforcement learning neural network model is a continuous or discrete function.

In some specific embodiments of the present application, the deep reinforcement learning neural network model adopts an asynchronous structure in training, thereby reducing the complexity of the application of the deep reinforcement learning neural network model, and making the embodiments of the present application easier to implement.

In the embodiment of the present application, the deep enhanced learning neural network model can first realize the training of simple common tasks, and then receive the training of complex common tasks, and gradually increase the complexity of the training samples, which is convenient for the deep enhanced learning neural network in the embodiment of the present application. The accuracy with which the model achieves complex common tasks.

Step S3, sending corresponding action commands to each robot, so that each robot executes the corresponding action command, and cooperates to complete at least one common task.

The robots in the embodiment of the present application can realize cooperative work by executing corresponding action commands, and the cooperative work can be realized without implementing communication between the robots.

See Figure 5, using the actual data of N robots R ₁ ,...,R _N tested on the spot as the training samples of the deep reinforcement learning neural network model M ₁ ,...,M _N , each robot will capture the local robot scene information The deep reinforcement learning neural network model M ₁ ,...,M _N corresponding to the input of other robot scene information D ₁ ,...,D _N . Scene information includes: robot running image, robot running force, robot running distance, and robot running angle. The N deep reinforcement learning neural network models M ₁ _, ...,M _N corresponding to N robots R _{1 ,} ...,R _N respectively receive the local robot scene information and other robot scene information sent by N robots R 1 ,...,R _N D ₁ , . . . , D _N , and communicate with each other. The common task 501 is parsed by the task parsing module 502 into a time-series-based action set and state set 503 (which includes action sets and state sets), and is sent to N robots R ₁ , ..., R _N corresponding to N deep reinforcement learning neural network models M ₁ ,...,M _N . N robots R ₁ ,...,R _N correspond to N deep reinforcement learning neural network models M ₁ ,...,M _N According to local robot scene information and other robot scene information D ₁ ,...,D _N , actions based on time series Set and state set 503 conduct end-to-end training to obtain action commands A ₁ , ..., A _N executed by each robot R ₁ , ..., R _N .

In the embodiment of the present application, the training samples of the deep reinforcement learning neural network model may use data obtained by training various robots in a virtual environment, and/or data obtained by training multiple robots to perform various common tasks in a virtual environment.

In the embodiment of the present application, the training samples of the deep reinforcement learning neural network model may use the mixed data of virtual data obtained in a virtual environment combined with actual data of on-site testing.

In this embodiment of the present application, the training samples of the deep reinforcement learning neural network model may use actual data from field tests of multiple robots.

The embodiment of the present application further improves the training effect of the deep reinforcement learning neural network model through various training samples.

In order to expand the application scope of the deep reinforcement learning neural network model so that it can improve the performance of more common tasks, the deep reinforcement learning neural network model is a continuous or discrete function.

The deep reinforcement learning neural network model adopts an asynchronous structure in training, thereby reducing the complexity of the application of the deep reinforcement learning neural network model, and making the embodiment of the present application easier to implement.

In the embodiment of the present application, the deep enhanced learning neural network model can first realize the training of simple common tasks, and then receive the training of complex common tasks, and gradually increase the complexity of the training samples, which is convenient for the deep enhanced learning neural network in the embodiments of the present application. The accuracy with which the model achieves complex common tasks.

The embodiment of the present application continuously improves the accuracy of the output action commands of the deep reinforcement learning neural network model through the training of the deep reinforcement learning neural network model, improves the efficiency of the collaborative work of multiple robots, and simplifies the complexity of the collaborative work control of multiple robots.

Corresponding to the above method, some embodiments of the present application also provide a control system for multiple robots working together, including: multiple robots, each of which captures scene information respectively, and the scene information includes: local robot scene information and other robot scene information;

The controllers of each robot use the control algorithm to calculate the scene information to obtain the corresponding action commands of each robot; and send the corresponding action commands to each robot so that each robot can execute the corresponding action commands and work together to complete at least one common task. Task.

Specifically, the control algorithm is learned and obtained according to the deep reinforcement learning neural network model.

The control algorithm is integrated in the memory of each robot. According to the needs, each controller calls the control algorithm to calculate the scene information and obtain the corresponding action command of each robot, so as to realize the low-latency control of each robot and ensure the accuracy of the action command. .

The control system in which a plurality of robots work together in this embodiment is used to implement the corresponding control methods in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here. In addition, for the implementation of each robot in the control system of this embodiment, reference may be made to the descriptions of corresponding parts in the foregoing method embodiments, and details are not repeated here.

Corresponding to the above method, some embodiments of the present application also provide a robot. The robot sends the captured scene information to the controller. The scene information includes: local robot scene information and other robot scene information; the controller uses a control algorithm to process the scene information. Calculate and obtain the action commands corresponding to each robot; the robot executes the corresponding action commands, and cooperates with other robots to complete at least one common task.

The robot in this embodiment is used to implement the corresponding control methods in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, which will not be repeated here. In addition, for the implementation of the robot in this embodiment, reference may be made to the descriptions of corresponding parts in the foregoing method embodiments, which will not be repeated here.

Corresponding to the above method, some embodiments of the present application further provide a computer program product, which is tangibly stored on a readable medium of the controller, and has computer-executable instructions. When executed, the computer-executable instructions cause at least one processing implement any of the above methods.

Corresponding to the above method, some embodiments of the present application further provide a computer-readable medium on which computer-executable instructions are stored, and when executed, the computer-executable instruction causes at least one processor to execute the above-mentioned method.

It should be pointed out that, according to the needs of implementation, each component/step described in the embodiment of the present application can be divided into more components/steps, and two or more components/steps or partial operations of components/steps can also be combined into New components/steps to achieve the purpose of the embodiment of the present application.

The above-mentioned method according to the embodiment of the present application can be implemented in hardware, firmware, or as software or computer code that can be stored in a recording medium (such as CD ROM, RAM, floppy disk, hard disk, or magneto-optical disk), or implemented by Computer code downloaded from a network that is originally stored on a remote recording medium or a non-transitory machine-readable medium and will be stored on a local recording medium so that the methods described herein can be stored on a computer code using a general-purpose computer, a dedicated processor, or a programmable Such software processing on a recording medium of dedicated hardware such as ASIC or FPGA. It will be appreciated that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when When the processor or hardware accesses and executes, the verification code generation method described here is realized. In addition, when a general-purpose computer accesses the code for implementing the check code generation method shown here, the execution of the code converts the general-purpose computer into a special-purpose computer for executing the check code generation method shown here.

Those skilled in the art can appreciate that the units and method steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Professionals and technicians may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the embodiments of the present application.

The above implementations are only used to illustrate the embodiments of the application, rather than to limit the embodiments of the application. Those of ordinary skill in the relevant technical fields can also make various implementations without departing from the spirit and scope of the embodiments of the application Changes and modifications, so all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims

A control method for a plurality of robots working together, comprising:

Accepting scene information captured separately from each robot, said scene information including: local robot scene information and other robot scene information (S1);

Using a control algorithm to calculate the scene information to obtain action commands corresponding to each robot (S2);

Send corresponding action commands to each robot, so that each robot executes the corresponding action command, and cooperates to complete at least one common task (S3).
The method according to claim 1, wherein the scene information includes: at least one of robot running image, robot running force, robot running distance, and robot running angle.
The method according to claim 1, wherein the motion command comprises: robot motion rotation angle and/or robot motion torque.
The method according to claim 1, wherein the scene information is captured by at least one sensor installed on each robot.
The method according to claim 1, wherein said using a control algorithm to calculate said scene information to obtain action commands (S2) corresponding to each robot, comprising:

Obtaining a common task, and parsing the common task into a time series-based action set and state set (S21);

Obtaining the state of the local robot and the state of other robots according to the scene information (S22);

Using a control algorithm to calculate the local robot state, the other robot states, and the time-series-based action set and state set to obtain action commands corresponding to each robot (S23).
The method according to claim 1, wherein the control algorithm is learned and obtained according to a deep reinforcement learning neural network model.
The method according to claim 6, wherein, the deep reinforcement learning neural network model is obtained by training in a virtual environment using various robots; and/or, the deep reinforcement learning neural network model adopts a plurality of robots in a virtual environment Obtained by performing various common mission training.
The method according to claim 6, wherein the deep reinforcement learning neural network model is obtained by training the mixed data of virtual data obtained in a virtual environment combined with actual data tested by a single robot in the field.
The method according to claim 6, wherein the deep reinforcement learning neural network model is obtained by training with actual data of a plurality of robots tested on site.
The method according to claim 6, wherein the deep reinforcement learning neural network model is a continuous or discrete function.
The method according to claim 6, wherein the deep reinforcement learning neural network model adopts an asynchronous structure in training.
A control system for multiple robots working together, including:

A plurality of robots, each robot captures scene information respectively, and the scene information includes: local robot scene information and other robot scene information;

A controller, which uses a control algorithm to calculate the scene information to obtain the corresponding action command of each robot; and sends the corresponding action command to each robot, so that each robot executes the corresponding action command, and cooperates to complete at least one a common task.
The system according to claim 12, wherein the control algorithm is learned and obtained according to a deep reinforcement learning neural network model.
A robot, the robot sends captured scene information to a controller, the scene information includes: local robot scene information and other robot scene information; the controller uses a control algorithm to calculate the scene information to obtain each An action command corresponding to the robot; the robot executes the corresponding action command, and cooperates with other robots to complete at least one common task.
The robot according to claim 13, wherein the control algorithm is learned and obtained according to a deep reinforcement learning neural network model.
A computer program product tangibly stored on a readable medium of a controller and computer executable instructions which, when executed, cause at least one processor to perform the one of the methods described.
A computer-readable medium having stored thereon computer-executable instructions which, when executed, cause at least one processor to perform the method of any one of claims 1-11.