CN114454176B

CN114454176B - Robot control method, control device, robot, and storage medium

Info

Publication number: CN114454176B
Application number: CN202210237386.2A
Authority: CN
Inventors: 陈金亮; 陈相羽; 何旭; 刘旭东
Original assignee: Shenzhen Pengxing Intelligent Research Co Ltd
Current assignee: Shenzhen Pengxing Intelligent Research Co Ltd
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2024-03-12
Anticipated expiration: 2042-03-11
Also published as: CN114454176A

Abstract

The application discloses a control method and device of a robot, the robot and a storage medium. The control method of the robot comprises the following steps: receiving a task instruction and acquiring the name or picture information of a target object to be operated according to the task instruction; identifying environmental object information according to the acquired visual information; analyzing environmental object information and an object to be operated through a robot data set facing a task to acquire object information and task scene information; generating an operation instruction through a robot operation strategy set facing the task according to the task instruction, the target object information and the task scene information, and controlling the robot to operate on the target object according to the operation instruction. Therefore, the operation instruction can be obtained according to the task instruction, the environment object information and the operation strategy set, the operation instruction can be executed according to the task instruction, and the operation on the target object can be realized efficiently.

Description

Robot control method, control device, robot, and storage medium

Technical Field

The present invention relates to the field of robot control technology, and more particularly, to a control method and apparatus for a robot, and a storage medium.

Background

With the continuous development of society, robots are increasingly applied to the fields of production, household use and the like. When the robot runs, corresponding operations are required to be executed according to the environment or user instructions so as to complete corresponding tasks.

However, in the existing control method of the robot, complex models such as kinematics and dynamics are required to be built, different tasks are difficult to face, and features are spontaneously learned from a data set without relying on expert knowledge, so that an operation strategy set with high abstraction degree and good robustness performance is obtained. How to make the robot accurately finish the operation according to the environment or instructions, and more intensively solve the problems of the expression of the target object (target object identification, etc.) and the perception and understanding of the robot (grasping point and operation planning, etc. for the target object) to be solved.

Disclosure of Invention

The embodiment of the application provides a control method and device for a robot, the robot and a storage medium.

The control method of the robot according to the embodiment of the application includes: receiving a task instruction and acquiring the name or picture information of a target object to be operated according to the task instruction;

identifying environmental object information according to the acquired visual information;

Analyzing the environmental object information and the name or picture information of the object to be operated through a robot data set facing the task to acquire object information and task scene information;

generating an operation instruction through a robot operation strategy set facing to a task according to the task instruction, the target object information and the task scene information, and controlling the robot to operate on the target object according to the operation instruction.

The control device for a robot according to an embodiment of the present application includes:

the first acquisition module is used for receiving a task instruction and acquiring the name or picture information of a target object to be operated according to the task instruction;

the identification module is used for identifying the environmental object information according to the acquired visual information;

the second acquisition module is used for analyzing the environmental object information and the object to be operated through a robot data set facing the task to acquire object information and task scene information;

the control module generates an operation instruction through a task-oriented robot operation strategy set according to the task instruction, the target object information and the task scene information, and controls the robot to operate on the target object according to the operation instruction.

The robot according to the embodiment of the application comprises: the robot control system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the control method of the robot according to the embodiment when executing the computer program.

The computer-readable storage medium according to the embodiment of the present application stores a computer program that, when executed by a processor, realizes the control method of the robot according to the embodiment described above.

According to the control method, the control device, the robot and the storage medium of the robot, the operation instruction can be obtained according to the task instruction, the environment object information and the operation strategy set, the operation instruction is executed according to the task instruction, and the operation on the target object is achieved.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

fig. 1 is a flow chart of a control method of a robot according to an embodiment of the present application;

Fig. 2 is a schematic block diagram of a control device of the robot according to the embodiment of the present application;

FIG. 3 is a block schematic diagram of a robot according to an embodiment of the present application;

fig. 4 is a schematic hardware configuration of the multi-legged robot according to the embodiment of the present application;

fig. 5 is a schematic structural view of the multi-legged robot according to the embodiment of the present application;

fig. 6 is another flow diagram of a control method of the robot according to the embodiment of the present application;

fig. 7 is another block diagram of a control device of the robot according to the embodiment of the present application;

fig. 8 is a schematic view of another flow of the control method of the robot according to the embodiment of the present application;

fig. 9 is a block diagram of a fifth acquisition module of the robot according to the embodiment of the present application;

fig. 10 is a schematic view of still another flow of the control method of the robot according to the embodiment of the present application;

fig. 11 is a schematic view of still another flow of the control method of the robot according to the embodiment of the present application;

fig. 12 is a further schematic block diagram of a control device of the robot according to the embodiment of the present application;

fig. 13 is a schematic view of a further flow of the control method of the robot according to the embodiment of the present application.

Description of main reference numerals:

the control apparatus 100, the first acquisition module 10, the identification module 20, the second acquisition module 30, the control module 50, the fifth acquisition module 60, the setting unit 61, the acquisition unit 62, the screening unit 63, the screening module 70, the storing module 80, the robot 1000, the processor 300, the memory 200, the multi-legged robot 400, the mechanical unit 401, the driving board 4011, the motor 4012, the mechanical structure 4013, the body main body 4014, the extendable leg 4015, the foot 4016, the rotatable head structure 4017, the swingable tail structure 4018, the carrier structure 4019, the saddle structure 4020, the camera structure 4021, the communication unit 402, the sensing unit 403, the interface unit 404, the storage unit 405, the display unit 406, the display panel 4061, the input unit 407, the touch panel 4071, the input device 4072, the touch detection apparatus 4073, the touch controller 4074, the machine control module 410, and the power source 411.

Detailed Description

Embodiments of the present application are described in detail below, and are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In the following description, suffixes such as "module", "component", or "unit" for representing components are used only for facilitating the description of the present invention, and have no specific meaning in themselves. Thus, "module," "component," or "unit" may be used in combination.

Referring to fig. 1, a control method of a robot 1000 according to an embodiment of the present application includes the steps of:

s10: receiving a task instruction and acquiring the name or picture information of a target object to be operated according to the task instruction;

s20: identifying environmental object information according to the acquired visual information;

s40: the method comprises the steps of analyzing environmental object information and names or picture information of objects to be operated through a task-oriented robot 1000 data set to obtain object information and task scene information;

S50: and generating an operation strategy set instruction through the operation strategy set according to the task instruction, the target object information and the task scene information, and controlling the robot 1000 to operate on the target object according to the operation instruction.

Referring to fig. 2, a control device 100 of a robot 1000 according to an embodiment of the present application includes a first acquisition module 10, an identification module 20, a second acquisition module 30, and a control module 50. Step S10 may be implemented by the first acquisition module 10, step S20 may be implemented by the identification module 20, step S40 may be implemented by the second acquisition module 30, and step S50 may be implemented by the control module 50.

That is, the first obtaining module 10 may be configured to obtain the object to be operated according to the task instruction, the identifying module 20 may be configured to identify the environmental object information according to the obtained visual information, the second obtaining module 30 may be configured to obtain the object information and the task scene information by analyzing the environmental object information and the object to be operated through the task-oriented robot 1000 data set, and the control module 50 may be configured to generate the operation instruction according to the task instruction, the object information and the task scene information through the task-oriented robot operation policy set, and control the robot 1000 to operate the object according to the operation instruction.

Referring to fig. 3, a robot 1000 according to an embodiment of the present application includes a memory 200, a processor 300, and a computer program stored on the memory 200 and executable on the processor 300, and the processor 300 implements a control method of the robot 1000 according to an embodiment of the present application when executing the computer program. As such, the control method of the robot 1000 according to the embodiment of the present application may be implemented by the robot 1000 according to the embodiment of the present application, where each of step S10, step S20, step S40, and step S50 may be implemented by the processor 300, that is, when the processor 300 executes the computer program: receiving a task instruction and acquiring a target object to be operated according to the task instruction; identifying environmental object information according to the acquired visual information; the method comprises the steps of analyzing environmental object information and names or picture information of objects to be operated through a task-oriented robot 1000 data set to obtain object information and task scene information; and generating an operation strategy set instruction through the operation strategy set according to the task instruction, the target object information and the task scene information, and controlling the robot 1000 to operate on the target object according to the operation instruction.

The processor 300 may be referred to as a drive board. The drive board may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processor 300, digital signal processor 300 (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

Specifically, in order to realize that the robot can accurately complete the operation according to the environment and the instruction, a machine learning method is generally adopted to train the robot, and in the machine learning process, a data set is particularly important as a basis of machine learning. In existing machine learning, the data set typically includes only information about the target object, and lacks content about the set of robot operating strategies. Therefore, the present invention proposes a control method of the task-oriented robot 1000 to more intensively solve the expression of the target object (target object identification, etc.) and the perception and understanding of the robot (grasping point and operation planning for the target object, etc.).

It should be noted that the robot 1000 may include a biped robot or a biped robot 400 capable of moving, such as a humanoid robot, a robot dog, a robot horse, or the like, and may include a mechanical arm, a mechanical leg, or the like, such as a robot for wine brewing, a robot for welding, or the like, and is not particularly limited herein.

Specifically, referring to fig. 4 and 5, fig. 4 is a schematic hardware structure of a multi-legged robot 400 according to one embodiment of the present invention, and fig. 5 is a schematic structure of the multi-legged robot 400. In the embodiment shown in fig. 4, the multi-legged robot 400 includes a mechanical unit 401, a communication unit 402, a sensing unit 403, an interface unit 404, a storage unit 405, a machine control module 410, and a power source 411. The various components of the multi-legged robot 400 can be connected in any manner, including wired or wireless connections, and the like. It will be appreciated by those skilled in the art that the specific structure of the multi-legged robot 400 illustrated in fig. 4 does not constitute a limitation of the multi-legged robot 400, the multi-legged robot 400 may include more or less components than illustrated, and that certain components do not necessarily constitute the multi-legged robot 400, may be omitted entirely or combined as necessary within the scope of not changing the essence of the invention.

The various components of the multi-legged robot 400 are described in detail below in conjunction with fig. 4 and 5:

the mechanical unit 401 is hardware of the multi-legged robot 400. As shown in fig. 4, the mechanical unit 401 may include a drive board 4011, a motor 4012, a mechanical structure 4013, and as shown in fig. 5, the mechanical structure 4013 may include a fuselage body 4014, extendable legs 4015, feet 4016, and in other embodiments, the mechanical structure 4013 may further include extendable robotic arms (not shown), a rotatable head structure 4017, a swingable tail structure 4018, a cargo structure 4019, a saddle structure 4020, a camera structure 4021, and the like. It should be noted that, the number of the component modules of the machine unit 401 may be one or more, and may be set according to the specific situation, for example, the number of the legs 4015 may be 4, each leg 4015 may be configured with 3 motors 4012, and the number of the corresponding motors 4012 is 12. It will be appreciated that the extendable robotic arm or extendable leg structure may be mounted on the back, tail, etc. of the multi-legged robot 400, which may be adjusted according to the use, production cost, etc. of the multi-legged robot 400, without specific limitation.

The communication unit 402 may be used for receiving and transmitting signals, or may be used for communicating with a network and other devices, for example, receiving command information sent by the remote controller or other multi-legged robot 400 to move in a specific direction at a specific speed value according to a specific gait, and then transmitting the command information to the machine control module 410 for processing. The communication unit 402 includes, for example, a WiFi module, a 4G module, a 5G module, a bluetooth module, an infrared module, and the like.

The sensing unit 403 is used for acquiring information data of the surrounding environment of the multi-legged robot 400 and monitoring parameter data of each component inside the multi-legged robot 400, and sending the information data to the machine control module 410. The sensing unit 403 includes various sensors, such as a sensor that acquires information of surrounding objects: lidar (for remote object detection, distance determination and/or speed value determination), millimeter wave radar (for short range object detection, distance determination and/or speed value determination), cameras, infrared cameras, global navigation satellite systems (GNSS, global Navigation Satellite System), etc. Such as sensors to monitor various components within the multi-legged robot 400: an inertial measurement unit (IMU, inertial Measurement Unit) (values for measuring velocity values, acceleration values and angular velocity values), plantar sensors (for monitoring plantar force point position, plantar posture, touchdown force magnitude and direction), temperature sensors (for detecting component temperature). As for the other sensors such as the load sensor, the touch sensor, the motor angle sensor, the torque sensor, etc. which may be further configured for the multi-legged robot 400, the detailed description thereof will be omitted.

The interface unit 404 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more components within the multi-legged robot 400, or may be used to output (e.g., data information, power, etc.) to an external device. The interface unit 404 may include a power port, a data port (e.g., a USB port), a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, and the like.

The storage unit 405 is used to store a software program and various data. The storage unit 405 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system program, a motion control program, an application program (such as a text editor), and the like; the data storage area may store data generated by the multi-legged robot 400 in use (such as various sensed data acquired by the sensing unit 403, log file data), and the like. In addition, the storage unit 405 may include high-speed random access memory, and may also include nonvolatile memory, such as disk memory, flash memory, or other volatile solid state memory. It will be appreciated that memory 200 may implement some or all of the functionality of storage unit 405.

The display unit 406 is used to display information input by a user or information provided to the user. The display unit 406 may include a display panel 4061, and the display panel 1061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like.

The input unit 407 may be used to receive input numeric or character information. Specifically, the input unit 407 may include a touch panel 4071 and other input devices 4072. The touch panel 4071, also referred to as a touch screen, may collect touch operations of a user (e.g., operations of the user on the touch panel 4071 or in the vicinity of the touch panel 4071 using a palm, a finger, or a suitable accessory), and drive the corresponding connection device according to a preset program. The touch panel 4071 may include two parts, a touch detection device 4073 and a touch controller 4074. Wherein, the touch detection device 4073 detects the touch orientation of the user, detects a signal caused by the touch operation, and transmits the signal to the touch controller 4074; the touch controller 4074 receives touch information from the touch detection device 4073 and converts it into touch point coordinates, which are then sent to the machine control module 410, and can receive commands sent from the machine control module 410 and execute them. The input unit 407 may include other input devices 4072 in addition to the touch panel 4071. In particular, other input devices 4072 may include, but are not limited to, one or more of a remote operated handle, etc., and are not limited specifically herein.

Further, the touch panel 4071 may overlay the display panel 4061, and when the touch panel 4071 detects a touch operation thereon or thereabout, the touch operation is transmitted to the machine control module 410 to determine the type of touch event, and then the machine control module 410 provides a corresponding visual output on the display panel 4061 according to the type of touch event. Although in fig. 4, the touch panel 4071 and the display panel 4061 are implemented as two independent components to implement the input and output functions, in some embodiments, the touch panel 4071 may be integrated with the display panel 4061 to implement the input and output functions, which is not limited herein.

The machine control module 410 is a control center of the multi-legged robot 400, connects the respective components of the entire multi-legged robot 400 using various interfaces and lines, and performs overall control of the multi-legged robot 400 by running or executing a software program stored in the storage unit 405, and calling data stored in the storage unit 405. It is appreciated that the processor 300 may implement some or all of the functionality of the machine control module 410.

The power supply 411 is used to supply power to the various components, and the power supply 411 may include a battery and a power control board for controlling functions such as battery charging, discharging, and power consumption management. In the embodiment shown in fig. 4, the power source 411 is electrically connected to the machine control module 410, and in other embodiments, the power source 411 may be further electrically connected to the sensor unit 403 (such as a camera, a radar, a speaker, etc.), and the motor 4012, respectively. It should be noted that each component may be connected to a different power source 411, or may be powered by the same power source 411.

On the basis of the above embodiments, specifically, in some embodiments, the terminal device may be in communication connection with the multi-legged robot 400, when the terminal device communicates with the multi-legged robot 400, instruction information may be sent to the multi-legged robot 400 through the terminal device, the multi-legged robot 400 may receive the instruction information through the communication unit 402, and the instruction information may be transmitted to the machine control module 410 when the instruction information is received, so that the machine control module 410 may process to obtain the target speed value according to the instruction information. Terminal devices include, but are not limited to: a mobile phone, a tablet personal computer, a server, a personal computer, a wearable intelligent device and other electrical equipment with an image shooting function.

The instruction information may be determined according to preset conditions. In one embodiment, the multi-legged robot 400 may include a sensing unit 403, and the sensing unit 403 may generate instruction information according to the current environment in which the multi-legged robot 400 is located. The machine control module 410 may determine whether the current speed value of the multi-legged robot 400 satisfies the corresponding preset condition according to the instruction information. If so, the current speed value and current gait movement of the multi-legged robot 400 are maintained; if not, the target speed value and the corresponding target gait are determined according to the corresponding preset conditions, so that the multi-legged robot 400 can be controlled to move at the target speed value and the corresponding target gait. The environmental sensor may include a temperature sensor, a barometric pressure sensor, a visual sensor, an acoustic sensor. The instruction information may include temperature information, air pressure information, image information, sound information. The communication between the environmental sensor and the machine control module 410 may be wired or wireless. Means of wireless communication include, but are not limited to: wireless networks, mobile communication networks (3G, 4G, 5G, etc.), bluetooth, infrared.

It will be appreciated that as society continues to develop, robots 1000 are increasingly being used in production, home use, etc. When the robot 1000 is running, it needs to perform corresponding operations according to the environment or user instructions to complete corresponding tasks. Therefore, how to enable the robot 1000 to accurately perform operations according to the environment or instructions is a technical problem to be solved.

In the control method and the control device 100 of the robot 1000 according to the embodiments of the present application, the operation instruction can be obtained according to the task instruction, the environmental object information, and the operation policy set, so that the operation instruction is executed according to the task instruction and the operation on the target object can be efficiently realized.

Specifically, in the embodiment of the present application, the task instruction may be grasping, pushing, taking, or the like, which is not particularly limited herein. For example, the task instruction may be to take a cup, push a chair, open a door, or the like, in which case the task instruction is to take a cup, obtain an object to be operated as a cup, in which case the task instruction is to push a chair, the object to be operated is a chair, and in which case the task instruction is to open a door, the object to be operated is a door and a door handle. There are many methods for obtaining name information of an object to be operated according to a task instruction, which may be to specify the object to be operated when an operation instruction is input, or set a model related to a correspondence between the operation instruction and the object to be operated, input the operation instruction into the model, and analyze the operation instruction to obtain the object to be operated. For ease of understanding, the following is illustrative. In some embodiments, the input operation instruction is "out of room", the task instruction of "out of room" is received, and the name information of the object to be operated is obtained as "door". It should be noted that, the process of acquiring the name information of the object to be operated as "door" according to the task instruction "out of the room" may be obtained according to the corresponding relationship between the task instruction set in advance and the name information of the object to be operated, or may be that the user inputs the name information of the object to be operated as "door" according to the task instruction "out of the room", which is not listed here. In other embodiments, the input operation instruction is "door opening", the task instruction of "door opening" is received, and the name information of the object to be operated is obtained as "door".

In the process of identifying the environmental object information according to the acquired visual information, the acquired visual information may be processed to identify each specific object in the environment, and the environmental object information includes information of each specific object in the environment. Specifically, the information of each specific object may include size information of the object, color information of the object, material of the object, and the like, and is not particularly limited herein.

The environmental object information and the name or picture information of the object to be operated can be analyzed through the robot 1000 data set facing the task, so that the object information and the task scene information are obtained, the object can be an object needing to be operated in the task instruction, the task scene information can comprise a non-object, an object which is not considered in the task instruction, and an object needing to avoid barriers in the task instruction, and detailed description is omitted.

It should be noted that, the environmental object information may include size information, color information, material, whether the operation can be performed by grabbing, dragging or rotating, and the like, and the target object information and the task scene information may be obtained from the environmental object information in step S40, respectively. The environmental object information may also include simplified information such as size information and color information of each object, and then the object and task scene information obtained in S40 are analyzed in detail to obtain information such as the material of the object, whether the object can be operated in a grabbing, dragging or rotating manner, and the pose relationship between the object and the robot 1000. Thus, the step S20 can be prevented from being performed for a long time, and the environment object information content is prevented from being excessively redundant.

The operation instruction may include specific steps of operation and specific parameters of execution, for example, the robot 1000 compliant operation related to the task of opening the door and going out, the robot 1000 motion planning, etc., and then the operation instruction may include force sense information input required in the compliant operation, parameters and track of the robot 1000 compliant operation, visual information input required in the motion planning, parameters and implementation track of the robot 1000 motion planning, etc.

It is worth to say that, the task-oriented robot operation strategy stores the corresponding relation between the task instruction, the target object information and the task scene information and the operation instruction, and can output the corresponding task instruction according to the task instruction, the target object information and the task scene information, so that the robot is controlled to operate the target object according to the task instruction.

Referring to fig. 6, in some embodiments, a control method of a robot 1000 includes:

s60, obtaining target network parameter data according to the first batch of operation method data, the first batch of state data, the first batch of observation value data, the first batch of success measurement data and a preset network parameter set, wherein the target network parameter data is used for representing the corresponding relation between the first batch of operation method data, the first batch of state data, the first batch of observation value data and the first batch of success measurement data, and the network parameter set comprises a plurality of training network parameters;

And S70, screening out the preferred operation method data according to the second operation method data, the second state data, the second observation value data and the target network parameter data, and storing the preferred operation method data into an operation policy set.

In some embodiments, referring to fig. 7, the control device of the robot 1000 includes a fifth obtaining module 60 and a screening module 70, step S60 may be implemented by the fifth obtaining module 60, and step S70 may be implemented by the screening module 70. That is, the fifth obtaining module 60 may be configured to obtain target network parameter data according to the first operating method data, the first state data, the first observation data, the first success metric data, and a preset network parameter set, where the target network parameter data is used to represent a correspondence between the first operating method data, the first state data, the first observation data, and the first success metric data, and the network parameter set includes a plurality of training network parameters. The filtering module 70 may be configured to filter out preferred operation method data according to the second batch of operation method data, the second batch of status data, the second batch of observation data, and the target network parameter data, and store the preferred operation method data in the operation policy set.

In some embodiments, robot 1000 includes a processor 300. Step S60 and step S70 may be implemented by the processor 300, that is, when the processor 300 executes the computer program, the processor 300 implements obtaining target network parameter data according to the first batch of operation method data, the first batch of state data, the first batch of observation value data, the first batch of success metric data and a preset network parameter set, where the target network parameter data is used to represent a correspondence between the first batch of operation method data, the first batch of state data, the first batch of observation value data and the first batch of success metric data, and the network parameter set includes a plurality of training network parameters; and screening out the preferred operation method data according to the second operation method data, the second state data, the second observation value data and the target network parameter data, and storing the preferred operation method data into an operation strategy set.

In this way, network parameters can be obtained, so that the operation method data is filtered according to the operation method data, the state data, the observed value data and the target network parameter data.

Specifically, the operation method data is used to represent data of each operation method of the robot 1000, and the operation method data may include a distance required to be moved by the robot 1000 in a process of the robot 1000 achieving gripping of the object, a step required to be moved by the robot 1000, an angle required to be rotated by the robot 1000 in a process of opening the object, a step required to be rotated and moved by the robot 1000, and the like. For example, the robot 1000 includes a robot arm, the operation method data may include a plurality of steps, wherein step 1 is that the robot arm extends forward by 90 cm, step 2 is that the robot arm moves downward by 40 cm, step 3 is that the clamping jaw of the robot arm is closed until the force sensor senses a reaction force, and step 4 is that the clamping jaw of the robot arm rotates by 90 degrees.

It should be noted that the operation method data is obtained in a preset manner, and may be obtained by means of simulation, manual input, and the like, which is not particularly limited herein.

The state data may include parameters such as a shape of the target object, a coordinate position of the robot 1000, a pose relationship of the target object and the robot 1000, a friction coefficient between the robot 1000 and the target object, and the like. It is understood that the status data may contain all status information needed in connection with the process of the robot 1000 implementing the task instructions. For example, robot 1000 includes a robotic arm and the task order is to take a front cup, then the status data may include the cup being 60 cm in front of the robotic arm, the shape of the cup, the coefficient of friction between the cup and the jaws of the robotic arm, and so on. It is worth noting that the shape of the cup may be fitted from a point cloud. Specifically, the point cloud data refers to a set of vectors in a three-dimensional coordinate system. The point cloud data may include not only three-dimensional coordinates of each point but also color information, reflection intensity information, and the like of each point, without being particularly limited thereto. The point cloud data may be obtained by the robot 1000 according to needs, may be obtained by pre-stored point cloud data, or may be obtained by other devices.

The point cloud data may be obtained by a vision sensor. The vision sensor may be a 3D industrial camera. The vision sensor can collect the point cloud data of the building. The point cloud data includes coordinate values of XYZ three axes of the point in space, including XYZ three-axis orientations of the point cloud itself. And performing fine operations such as point cloud filtering, outlier rejection and the like on the point cloud data to form accurate point cloud data.

The observations data describe observations in the current operational plan. It will be appreciated that in the course of the robot 1000 completing the task instructions, visual image information of the environment including the environmental object and the target object needs to be acquired, the visual image information including RGB information, depth information, and the RGB information, the depth information data being used for the calculation of the subsequent operation strategy. Observations are data describing the current scene, which may be obtained from a visually entered depth map or point cloud. The depth image is captured based on the depth camera, and the depth image is acquired based on a depth camera coordinate system when the depth image is captured.

The success metric data is used to describe whether the task instruction was completed successfully. The success metric data may be represented by S (u, x), where u is the operational method data and x is the state data. Robustness metric E _Q When (can be understood as the operation quality) is greater than the threshold delta (empirical value), S (u, x) =1, otherwise S (u, x) =0, i.e. the robustness measure is greater than the threshold, the operation is considered successful, and when the robustness measure is less than the threshold, the operation is considered failed; in some embodiments, the robustness measure may depend on factors such as the friction coefficient of the robot after contact with the object to be operated, the robot operation error, etc.

It should be noted that, before the target network parameter data is obtained according to the first batch of operation method data, the first batch of state data, the first batch of observation value data, the first batch of success measurement data and the preset network parameter set, the first batch of operation method data may be input to simulation software or input to the robot 1000, and the simulation software obtains the corresponding state data, the observation value data and the success measurement data in the simulation process, or the robot 1000 obtains the state data, the observation value data and the success measurement data in the actual operation. And taking the state data, the observed value data and the success measurement data as preset data to obtain target network parameter data.

The network parameter set comprises a plurality of training network parameters, and the target network parameters of the corresponding relation between the operation method data, the state data, the observation value data and the success measurement data which are relatively attached are screened out from the plurality of training network parameters according to the operation method data, the state data, the observation value data and the success measurement data. Preferably, the most fitting training network parameters corresponding to the correspondence between the operation method data, the state data, the observed value data and the success metric data are selected as the target network parameters. It should be noted that, in order to ensure accuracy of the target network parameters, enough operation method data, state data, observation value data and success measurement data should be input, so as to avoid inaccurate target network parameters caused by less sample size.

It can be appreciated that the network parameter set is a preset set including a plurality of training network parameters, and specific values of the training network parameters can be set empirically. It can be appreciated that presetting a number of training network parameters corresponds to setting initial parameter values before iteration through an algorithm, which is a common practice and will not be described in detail herein.

For ease of understanding, the following is illustrative. In the process of obtaining the corresponding relation, namely training the target network parameters, the input and the data required to be output are required to be continuously calculated to obtain the corresponding relation between the input and the output, taking the model of the input and the output as Y=ax+b as an example, taking X as the input, taking Y as the output, and taking a and b as constants. In this embodiment, the input X is operation method data, state data and observation value data, the output Y is success measurement data, and a plurality of operation method data, state data, observation value data and success measurement data corresponding to the three data are continuously input into the model, so as to obtain a and b which are closer to the corresponding relationship between input and output, and a and b are target network parameter data. It will be appreciated that the training network parameter set includes a plurality of a ', b' from which the corresponding target network parameter data a and b are screened. Preferably, the target network parameter data is the data closest to the corresponding relationship between input and output.

It can be understood that, in the training process for obtaining the target network parameter data, after the training is completed, the use process is as follows: inputting the newly acquired operation method data, state data and observed value data into a model with target network parameters, obtaining a predicted value of success measurement data, screening the operation method data corresponding to the success measurement data according to the predicted value of the success measurement data, namely, according to the screened operation method data, leading the predicted value of the success measurement data into the model to enable the predicted value of the success measurement data to be greater than a certain threshold value, indicating that the success rate of completing task instructions of the screened operation method data is higher, storing the screened operation method data into an operation strategy set, and conveniently calling the operation method data with higher success rate of completing the task instructions in the operation strategy set when the robot 1000 faces the task instructions.

It should be noted that different task instructions may correspond to different target network parameter data, and the data of the same type of task instruction should be used in the process of training the target network parameter data. Otherwise, if the relevant data of the task instruction "pick up the cup" and the relevant data of the task instruction "open the door" are trained together, in the actual operation, when the task instruction "pick up the cup" may be encountered while the accurate training target network parameters are difficult to obtain, the robot 1000 performs the door opening operation, and the task instruction "open the door" is that the robot 1000 performs the operation of picking up the cup.

Referring to fig. 8, in some embodiments, step S60 includes the steps of:

s61, setting a robustness function taking training network parameters as parameters according to a network parameter set, wherein the robustness function is used for calculating mathematical expectations;

s63, inputting the first operation method data, the first state data and the first observation value data into a robustness function to obtain mathematical expected data corresponding to training network parameters;

s65, screening corresponding network parameter data in the network parameter set according to the mathematical expected data and the success measurement data corresponding to the training network parameters.

In this way, the corresponding network parameter data can be screened.

In some embodiments, referring to fig. 9, the fifth obtaining module 60 includes a setting unit 61, an obtaining unit 62, and a screening unit 63. Step S61 may be implemented by the setting unit 61, step S63 may be implemented by the obtaining unit 62, and step S65 may be implemented by the screening unit 63, that is, the setting unit 61 is configured to set a robustness function with training network parameters as parameters according to a network parameter set, the robustness function is used to calculate mathematical expectations, the obtaining unit 62 is configured to input first batch operation method data, first batch state data and first batch observation value data into the robustness function, obtain mathematical expectancy data corresponding to the training network parameters, and the screening unit 63 is configured to screen corresponding network parameter data in the network parameter set according to the mathematical expectancy data and success metric data corresponding to the training network parameters.

In some embodiments, robot 1000 includes a processor 300. Step S61, step S63 and step S65 may be implemented by the processor 300, that is, the processor 300 implements setting a robustness function using the training network parameter as a parameter according to the network parameter set when executing the computer program, where the robustness function is used to calculate the mathematical expectation; inputting the first batch of operation method data, the first batch of state data and the first batch of observation value data into a robustness function to obtain mathematical expected data corresponding to training network parameters; and screening the corresponding network parameter data in the network parameter set according to the mathematical expected data and the success measurement data corresponding to the training network parameters.

Specifically, the robustness function may be represented by the formula Q (u, y) =e [ s|y ], where u is the operation method data, y is the observed value, and S is the success metric data. The robustness function can be understood as the mathematical expectation of successfully performing operation S in the case of a joint distribution of operations and observations at a time. The parameters of the robustness function are training network parameters, in the process of obtaining corresponding target network parameters, each training network parameter is brought into the robustness function, so that mathematical expected data corresponding to each training network parameter is obtained, the mathematical expected data is compared with success measurement data, and the training network parameter with the smallest difference between the mathematical expected data and the success measurement data is used as the target network parameter.

Referring to fig. 10, in some embodiments, S65 may include the steps of:

s651: calculating the gap expectation of the mathematical expectation data and the success measurement data corresponding to each training network parameter data according to the mathematical expectation data and the success measurement data corresponding to the training network parameters;

s653: and comparing the gap expectations corresponding to the training network parameter data, and selecting the training network parameter data with the minimum gap expectations as target network parameter data.

In this way, the training network parameter with the smallest gap expectation value can be obtained as the target network parameter data.

In some embodiments, the above step S651 and step S653 may be implemented by the screening unit 63, that is, the screening unit 63 may be configured to calculate, according to the mathematical expectation data and the success metric data corresponding to the training network parameters, a gap expectation between the mathematical expectation data and the success metric data corresponding to each training network parameter data; and comparing the gap expectations corresponding to the training network parameter data, and selecting the training network parameter data with the minimum gap expectations as the network parameter data.

In some embodiments, robot 1000 includes a processor 300. Step S651 and step S653 may be implemented by the processor 300, that is, the processor 300 implements, when executing the computer program, calculating a gap expectation between mathematical expectation data and success metric data corresponding to each training network parameter data according to mathematical expectation data and success metric data corresponding to the training network parameter; and comparing the gap expectations corresponding to the training network parameter data, and selecting the training network parameter data with the minimum gap expectations as the network parameter data.

Specifically, the mathematical expected data corresponding to the target network parameter data should have the smallest difference from the success metric data, so that the training network parameter data with the smallest difference expected is selected as the target network parameter data. The above procedure may employ the formula:

θ*＝argmax _θ∈Θ E _(S,u,x,y) [L(S,Q _θ (u,y))]

wherein θ is used to describe target network parameter data, θ is used to describe training network parameter data, Θ is used to describe a set of network parameters, S is used to describe success metric data, u is used to describe operational method data, x is used to describe state data, y is used to describe observations, L is used to express a loss function, E is used to express a desired function, Q _θ (u, y) is used to express a robustness function with a parameter θ.

It should be noted that, the specific content of the loss function L may be adjusted according to needs, so long as the loss function L can compare the difference between the robustness function and the success metric data, and detailed description is omitted herein.

Referring to fig. 11, in some embodiments, S70 includes the steps of:

s71, under the condition of certain state data, inputting second observation value data and second operation method data into a robust function taking network parameter data as parameters, and calculating mathematical expected data corresponding to each operation method data;

S73, comparing mathematical expected data corresponding to the second operation method data, and selecting the second operation method data with the largest mathematical expected data as preferable operation method data corresponding to the second operation state data.

In this way, operation method data corresponding to the state data can be obtained.

In some embodiments, the steps S71 to S73 may be implemented by the screening module 70, that is, the screening module 70 may be configured to input the second observation value data and the second operation method data into a robust function using the network parameter data as parameters under a certain state data, and calculate mathematical expected data corresponding to each second operation method data; and comparing the mathematical expected data corresponding to the second batch of operation method data, and selecting the second batch of operation method data with the maximum mathematical expected data as a screening operation method corresponding to the state data.

In some embodiments, robot 1000 includes a processor 300. Step S71 to step S73 may be implemented by the processor 300, that is, when the processor 300 executes the computer program to implement the process of inputting the second batch of observed value data and the second batch of operation method data into a robust function using the network parameter data as parameters under the condition of certain state data, and calculating mathematical expected data corresponding to each second batch of operation method data; and comparing mathematical expected data corresponding to the second batch of operation method data, and selecting the second batch of operation method data with the maximum mathematical expected data as a screening operation method corresponding to the second batch of state data.

Specifically, the above steps can be expressed by the formula:

π _θ (y)＝argmax _u∈S(N,T) Q _θ (u,y)

wherein pi _θ (y) a function of parameter θ for describing y, u for describing the operation method data, y for describing the observed value, S (N, T) for describing the set of operation method data, including several operation method data, Q _θ (u, y) is used to describe the robustness function.

It should be noted that, in different state data, the operation method data capable of completing the task instruction is also different, so that the operation method data should be screened in the case of certain state data, and the operation method data should be rescreened when the state data is replaced. For example, the robot 1000 includes a mechanical arm, the state data includes that the water cup is located in front of the mechanical arm by 60 cm, the operation method data of the mechanical arm touching the water cup includes that the mechanical arm moves forward by 60 cm, when the state data is changed to 120 cm in front of the mechanical arm, the operation method data of the mechanical arm touching the water cup includes that the mechanical arm moves forward by 120 cm, if the mechanical arm still moves forward by 60 cm, the mechanical arm cannot touch the water cup, and the task instruction cannot be completed.

In some embodiments, the control method of the robot 1000 further includes:

When the robot 1000 receives an external force, the robot 1000 generates an operation corresponding to the task instruction and the environmental object information, generates an operation instruction corresponding to the operation according to the operation, and stores the operation instruction in the operation policy set.

In this way, the set of operation policies can be modified.

In some embodiments, referring to fig. 12, the control device of the robot 1000 includes a logging module 80, where the above steps may be implemented by the logging module 80, that is, the logging module 80 may be configured to, when the robot 1000 receives an external force, cause the robot 1000 to generate an operation corresponding to a task instruction and environmental object information, generate, according to the operation, an operation instruction corresponding to the operation, and store the operation instruction into an operation policy set.

In some embodiments, robot 1000 includes a processor 300. The above steps may be implemented by the processor 300, that is, when the processor 300 executes the computer program, the robot 1000 receives an external force, so that the robot 1000 generates operations corresponding to the task instructions and the environmental object information, generates preferred operation method data corresponding to the operations according to the operations, and stores the preferred operation method data into the operation policy set.

Specifically, when the robot 1000 cannot complete the task instruction, the robot 1000 may apply force to the robot 1000 manually, so that the robot 1000 may complete the task instruction, and when the robot 1000 may also receive the external force, or when the robot 1000 may complete the task instruction, the robot 1000 may add an alternative operation policy set or an optimal operation policy set to the robot by applying force manually, and the reason why the robot 1000 receives the external force is many, which is adjusted according to factors such as a requirement on the richness of the operation policy set, the convenience of manual operation, and the like, which is not limited in detail herein.

It should be noted that, the data of the operation method that cannot complete the task instruction may be stored in the interference data set, so that the operation instruction that cannot complete the task instruction may be removed when the data is used as an interference item during training, so as to avoid that the robot 1000 invokes the operation instruction that cannot complete the task instruction in the process of completing the task instruction.

In a certain embodiment, referring to fig. 13, the task-oriented operation policy set construction and model generation are completed by acquiring the name or picture information of the object to be operated according to the task instruction and identifying the environmental object information according to the acquired visual information. Then, analyzing environmental object information and objects to be operated through a task-oriented robot 1000 data set, acquiring object information and non-object information, acquiring object information and task scene information according to the task instruction and the environmental object information, completing scene analysis oriented to specific tasks, generating an operation instruction through a task-oriented robot 1000 operation strategy set according to the task instruction, the object information and the task scene information, controlling the robot 1000 to operate the object according to the operation instruction, realizing matching, reorganization and realization of an operation technology oriented to specific tasks, judging whether the operation is successful after the robot 1000 operates the object, applying a force to the robot 1000 if the operation is unsuccessful, enabling the robot 1000 to generate an operation corresponding to the task instruction and the environmental object information, generating an operation instruction corresponding to the operation according to the operation, storing the operation instruction into the operation strategy set, and re-carrying out the matching, reorganizing and realization of the operation technology oriented to specific tasks, if the operation is successful, storing a successful operation sequence into the data set so as to be convenient for calling when the robot 1000 is instructed to complete the task instruction, and the data set capable of completing the task instruction is convenient to be generated. For example, in the door opening process, the robot fails to open the door due to insufficient force, so that the expected moment during the compliant control can be corrected by the human intervention until the specific operation task is successfully completed.

The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by the processor 300, implements the control method of the robot 1000 of any of the above embodiments. For example, referring to fig. 1, computer readable instructions, when executed by the processor 300, cause the processor 300 to perform the steps of:

s10: acquiring the name or picture information of a target object to be operated according to a task instruction;

In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, system that includes a processing module, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

The processor 300 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It is to be understood that portions of embodiments of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

Furthermore, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

Although the embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the embodiments described above by those of ordinary skill in the art within the scope of the application.

Claims

1. A control method of a robot, comprising:

receiving a task instruction and acquiring the name or picture information of a target object to be operated according to the task instruction;

generating an operation instruction through a robot operation strategy set facing a task according to the task instruction, the target object information and the task scene information, and controlling the robot to operate on the target object according to the operation instruction;

the control method of the robot further comprises the following steps:

obtaining target network parameter data according to first batch operation method data, first batch state data, first batch observation value data, first batch success measurement data and a preset network parameter set, wherein the target network parameter data are used for representing the corresponding relation among the first batch operation method data, the first batch state data, the first batch observation value data and the first batch success measurement data, and the network parameter set comprises a plurality of training network parameters;

Screening out preferred operation method data according to the second operation method data, the second state data, the second observation value data and the target network parameter data, and storing the preferred operation method data into the operation strategy set.

2. The method according to claim 1, wherein obtaining the target network parameter data based on the first batch of operation method data, the first batch of state data, the first batch of observation data, the first batch of success metric data, and the preset network parameter set comprises:

according to the network parameter set, setting a robustness function taking the training network parameter as a parameter, wherein the robustness function is used for calculating mathematical expectations;

inputting the first batch of operation method data, the first batch of state data and the first batch of observation value data into the robustness function to obtain mathematical expected data corresponding to the training network parameters;

and screening corresponding target network parameter data in the network parameter set according to the mathematical expected data corresponding to the training network parameters and the success measurement data.

3. The method according to claim 2, wherein screening the network parameter set for the corresponding target network parameter data based on the mathematical expectation data and the success metric data corresponding to the training network parameter, comprises:

Calculating the difference expectation of the mathematical expectation data corresponding to each training network parameter data and the success measurement data according to the mathematical expectation data corresponding to the training network parameter and the success measurement data;

and comparing the gap expectations corresponding to the training network parameter data, and selecting the training network parameter data with the minimum gap expectations as the target network parameter data.

4. The method of claim 2, wherein screening out preferred operation method data based on second batch of operation method data, second batch of status data, second batch of observation data, and the target network parameter data, storing the preferred operation method data into the operation policy set, comprising:

under the condition that the state data is certain, inputting the second batch of observed value data and the second batch of operation method data into a robust function taking the network parameter data as parameters, and calculating mathematical expected data corresponding to each operation method data;

and comparing mathematical expected data corresponding to each second batch of operation method data, and selecting the second batch of operation method data with the maximum mathematical expected data as preferable operation method data corresponding to the state data.

5. The method for controlling a robot according to claim 1, further comprising:

when the robot receives an external force, the robot generates an operation corresponding to the task instruction and the environmental object information, generates preferable operation method data corresponding to the operation according to the operation, and stores the preferable operation method data into the operation strategy set.

6. A control device for a robot, the control device comprising:

the first acquisition module is used for receiving the task instruction and acquiring the name or picture information of the object to be operated according to the task instruction;

the control module generates an operation instruction through a task-oriented robot operation strategy set according to the task instruction, the target object information and the task scene information, and controls the robot to operate on the target object according to the operation instruction;

The control device of the robot further comprises a fifth acquisition module and a screening module, wherein the fifth acquisition module is used for acquiring target network parameter data according to first batch operation method data, first batch state data, first batch observation value data, first batch success measurement data and a preset network parameter set, the target network parameter data are used for representing the corresponding relation among the first batch operation method data, the first batch state data, the first batch observation value data and the first batch success measurement data, and the network parameter set comprises a plurality of training network parameters; the screening module is used for screening out preferred operation method data according to the second batch of operation method data, the second batch of state data, the second batch of observation value data and the target network parameter data, and storing the preferred operation method data into the operation strategy set.

7. The control device of the robot according to claim 6, wherein the fifth acquisition module further comprises a setting unit, an obtaining unit, and a screening unit, the setting unit being configured to set a robustness function using the training network parameter as a parameter according to the network parameter set, the robustness function being configured to calculate a mathematical expectation; the obtaining unit is used for inputting the first batch of operation method data, the first batch of state data and the first batch of observation value data into the robustness function to obtain mathematical expected data corresponding to the training network parameters; the screening unit is used for screening corresponding target network parameter data in the network parameter set according to the mathematical expected data corresponding to the training network parameter and the success measurement data.

8. The control device of the robot according to claim 7, wherein the screening unit is further configured to calculate a gap expectation between the mathematical expectation data and the success metric data corresponding to each of the training network parameter data, based on the mathematical expectation data and the success metric data corresponding to the training network parameter; and comparing the gap expectations corresponding to the training network parameter data, and selecting the training network parameter data with the minimum gap expectations as the target network parameter data.

9. The control device of the robot according to claim 7, wherein the screening module is further configured to input the second set of observation value data and the second set of operation method data into a robust function using the network parameter data as a parameter, and calculate mathematical expectation data corresponding to each of the operation method data, in the case that the status data is certain; and comparing mathematical expected data corresponding to each second batch of operation method data, and selecting the second batch of operation method data with the maximum mathematical expected data as preferable operation method data corresponding to the state data.

10. The control device of the robot according to claim 6, further comprising a storing module for causing the robot to generate an operation corresponding to the task instruction and the environmental object information when the robot receives an external force, generating preferable operation method data corresponding to the operation according to the operation, and storing the preferable operation method data into the operation policy set.

11. A robot, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the control method of the robot according to any one of claims 1 to 5 when the computer program is executed.

12. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method of controlling a robot according to any one of claims 1-5.