CN113524166A

CN113524166A - Robot control method and device based on artificial intelligence and electronic equipment

Info

Publication number: CN113524166A
Application number: CN202110023393.8A
Authority: CN
Inventors: 郑宇�; 张丹丹; 魏磊; 张正友
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2021-10-22
Anticipated expiration: 2041-01-08
Also published as: CN113524166B

Abstract

The application provides a robot control method, a device, electronic equipment and a computer readable storage medium based on artificial intelligence; the method comprises the following steps: acquiring an image of an environment where the robot is located in the process of executing a task by the robot; the task comprises a plurality of cascaded task levels, and each task level comprises a plurality of candidate task schedules; traversing a plurality of task levels according to the image, and determining the task progress of the robot in a plurality of candidate task progresses included in the traversed task levels; performing motion planning processing according to the image to obtain motion data of the robot; determining target motion data according to task progress and motion data in a plurality of task levels; and controlling the robot according to the target motion data. By the method and the device, the robustness of robot control can be improved, and the success rate of task execution is improved.

Description

Robot control method and device based on artificial intelligence and electronic equipment

Technical Field

The present disclosure relates to artificial intelligence and big data technologies, and in particular, to a robot control method and apparatus based on artificial intelligence, an electronic device, and a computer-readable storage medium.

Background

A Robot (Robot) is an intelligent machine capable of semi-autonomous or fully autonomous operation, and is used to assist or even replace human beings to perform specific tasks, thereby serving human lives, extending or extending the range of activities and abilities of human beings. The robot can be widely applied to application scenes such as daily life and industrial manufacturing.

In the solutions provided in the related art, a series of control commands are usually preset inside the robot, so as to control the robot to perform a specific task. However, this scheme has poor robustness and cannot cope with unpredictable emergencies when facing actual tasks, i.e., the success rate of task execution is low.

Disclosure of Invention

The embodiment of the application provides a robot control method and device based on artificial intelligence, an electronic device and a computer readable storage medium, which can improve the robustness of robot control and improve the success rate of task execution.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a robot control method based on artificial intelligence, which comprises the following steps:

acquiring an image of an environment where a robot is located in the process of executing a task by the robot; wherein the task comprises a plurality of task levels in cascade connection, and each task level comprises a plurality of candidate task schedules;

traversing the plurality of task levels according to the image to determine the task progress of the robot in a plurality of candidate task progresses included in the traversed task levels;

performing motion planning processing according to the image to obtain motion data of the robot;

determining target motion data according to the task progress in the plurality of task levels and the motion data;

and controlling the robot according to the target motion data.

The embodiment of the application provides a robot control device based on artificial intelligence, includes:

the acquisition module is used for acquiring images of the environment where the robot is located in the process of executing tasks by the robot; wherein the task comprises a plurality of task levels in cascade connection, and each task level comprises a plurality of candidate task schedules;

the progress determining module is used for traversing the plurality of task levels according to the image so as to determine the task progress of the robot in a plurality of candidate task progresses included in the traversed task levels;

the motion planning module is used for carrying out motion planning processing according to the image to obtain motion data of the robot;

the combination module is used for determining target movement data according to the task progress in the plurality of task levels and the movement data;

and the control module is used for controlling the robot according to the target motion data.

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the robot control method based on artificial intelligence provided by the embodiment of the application when the executable instructions stored in the memory are executed.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute, so as to implement the robot control method based on artificial intelligence provided by the embodiment of the application.

The embodiment of the application has the following beneficial effects:

the task is divided into a plurality of cascaded task levels, in the process of executing the task by the robot, the task progress of the robot is determined step by step according to the image of the environment where the robot is located, the target motion data is determined by combining the motion data obtained by motion planning processing, the accuracy and the interpretability of controlling the robot according to the target motion data can be improved, and the success rate of executing the task is improved.

Drawings

FIG. 1 is a schematic diagram of an architecture of an artificial intelligence based robot control system provided in an embodiment of the present application;

fig. 2 is a schematic architecture diagram of a terminal device provided in an embodiment of the present application;

fig. 3A is a schematic flowchart of a robot control method based on artificial intelligence according to an embodiment of the present application;

fig. 3B is a schematic flowchart of a robot control method based on artificial intelligence according to an embodiment of the present application;

fig. 3C is a schematic flowchart of a robot control method based on artificial intelligence according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a robot control method based on artificial intelligence provided by an embodiment of the present application;

FIG. 5 is a block diagram of an architecture for data collection provided by an embodiment of the present application;

FIG. 6 is a block diagram of an architecture for data collection provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a neural network model corresponding to three levels provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an online real-time control of a robot according to an embodiment of the present disclosure.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein. In the following description, the term "plurality" referred to means at least two.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) Artificial Intelligence (AI): a theory, method, technique and application system for simulating, extending and expanding human intelligence, sensing environment, acquiring knowledge and using knowledge to obtain optimal results by using a digital computer or a machine controlled by a digital computer. In the embodiment of the application, models such as a progress classification model and a motion planning model can be constructed based on artificial intelligence principles (such as machine learning principles).

2) The robot comprises: an intelligent machine capable of semi-autonomous or fully autonomous operation. The structure of the robot is not limited in the embodiments of the present application, and for example, the robot may be a sweeping robot or a robot having a robot arm, and for the robot having the robot arm, the robot may have 6 degrees of freedom (for example, 6 joints) or 7 degrees of freedom.

3) Task: refers to the task to be performed by the robot and may be, for example, pouring, cooking, cleaning, nursing or cleaning, etc. In an embodiment of the application, the task comprises a plurality of task levels which are cascaded, and each task level comprises a plurality of candidate task schedules. The task hierarchy can be divided according to an actual application scene, for example, a water pouring task (controlling a source container to pour water into a target container) can be divided into a stage task hierarchy and a state task hierarchy, wherein the stage task hierarchy comprises 4 candidate task schedules, namely [ controlling the source container to approach the target container ], [ starting to pour water ], [ slowing down the speed of pouring water ], and [ ending to pour water and removing the source container ]; the status task hierarchy includes 10 candidate task schedules, respectively, the internal space of the target container has been filled with water by 10%, the internal space of the target container has been filled with water by 20%, and the internal space of … … target container has been filled with water by 100%.

4) Image of environment in which the robot is located: the image may be an image observed from the viewpoint of the robot itself (i.e., the first person's viewpoint), or an image observed from a third person's viewpoint, for example, an image captured by a camera independent of the robot. The acquired image of the environment in which the robot is located may not include any structure of the robot, and may include all or part of the structure of the robot.

5) Motion data: in essence, it is kinematic data, and for a robot, the data types of the motion data of the robot include, but are not limited to, pose values, angular velocities, and moment values.

6) The mock object: refers to the imitating target of the robot, and the imitating object can execute the task. In the embodiment of the application, the simulation object may be a user or another robot.

7) Big Data (Big Data): the data set which can not be captured, managed and processed by a conventional software tool in a certain time range is a massive, high-growth-rate and diversified information asset which can have stronger decision-making power, insight discovery power and flow optimization capability only by a new processing mode. The method is suitable for the technology of big data, and comprises a large-scale parallel processing database, data mining, a distributed file system, a distributed database, a cloud computing platform, the Internet and an extensible storage system. In the embodiment of the application, the big data technology can be utilized to realize model training and real-time robot control.

The embodiment of the application provides a robot control method and device based on artificial intelligence, an electronic device and a computer readable storage medium, which can improve the robustness of robot control and improve the success rate of task execution. An exemplary application of the electronic device provided in the embodiment of the present application is described below, and the electronic device provided in the embodiment of the present application may be implemented as various types of terminal devices, and may also be implemented as a server.

Referring to fig. 1, fig. 1 is an architecture diagram of an artificial intelligence based robot control system 100 provided in an embodiment of the present application, a terminal device 400 is connected to a server 200 through a network 300, and the server 200 is connected to a database 500, where the network 300 may be a wide area network or a local area network, or a combination of both. In fig. 1, taking the task of pouring water as an example, a robot 600 (the robot 600 shown in fig. 1 may be a wrist joint in one robot), a source container 700, and a target container 800 are also shown.

In some embodiments, taking the electronic device as a terminal device as an example, the robot control method based on artificial intelligence provided in the embodiments of the present application may be implemented by the terminal device. For example, the terminal device 400 runs the client 410, and the client 410 may be a client for controlling the robot 600. The client 410 may collect a sample image of an environment where the robot 600 is located, a sample task progress corresponding to the sample image, and sample motion data for controlling the robot 600 at a time corresponding to the sample image during the course of the robot 600 performing the historical task. The client 410 trains a progress classification model corresponding to a task level where the sample task progress is located according to the sample image and the sample task progress, and trains a motion planning model according to the sample image and the sample motion data. It should be noted that, during the course of the robot 600 executing the historical task, the user may generate a customized control instruction through the client 410 to control the robot, so as to ensure the accuracy of the sample image, the sample task progress, and the sample motion data.

When the robot 600 needs to be controlled to execute a task in real time, the client 410 collects an image of the environment where the robot 600 is located, wherein the client 410 can shoot the environment where the robot 600 is located through a camera inside the terminal device 400 to obtain the image; an image obtained by shooting the environment where the robot 600 is located by a camera independent of the terminal device 400 and the robot 600 may be acquired (acquired); an image obtained by shooting the outside with a camera inside the robot 600 (i.e., a first-person image of the robot 600) may be acquired. Then, for each task level included in the task, the client 410 performs progress classification processing on the acquired image according to the trained progress classification model corresponding to the task level, so as to obtain the task progress of the robot 600 at the task level. Meanwhile, the client 410 performs motion planning processing on the acquired image according to the trained motion planning model to obtain motion data. The client 410 determines target motion data according to the task progress and the motion data in the plurality of task levels, and generates a control instruction according to the target motion data to control the robot 600.

In some embodiments, taking the electronic device as a server as an example, the robot control method based on artificial intelligence provided in the embodiments of the present application may be cooperatively implemented by the server and the terminal device. For example, the server 200 obtains, from the database 500, a sample image of the environment where the robot 600 is located, which is collected historically, a sample task progress corresponding to the sample image, and sample motion data for controlling the robot 600 at a time corresponding to the sample image. And then training a motion planning model and a progress classification model corresponding to each task level according to the acquired data.

For the task to be executed, the server 200 obtains the image of the environment where the robot 600 is located, which is acquired by the client 410, from the client 410. Then, the server 200 performs progress classification processing on the acquired images according to the trained progress classification model corresponding to each task level, so as to obtain the task progress of the robot 600 at each task level. Meanwhile, the server 200 performs motion planning processing on the acquired image according to the trained motion planning model to obtain motion data. The server 200 determines target motion data according to the task progress and the motion data in the plurality of task levels, and sends a control instruction generated according to the target motion data to the client 410, so that the client 410 controls the robot 600 according to the control instruction; alternatively, the server 200 may transmit the target motion data to the client 410, so that the client 410 generates a control instruction according to the target motion data, and controls the robot 600 according to the control instruction. In some embodiments, the server 200 may also send the trained motion planning model and the trained progress classification model corresponding to each task level to the client 410, so that the client 410 performs prediction processing (i.e., motion planning processing and progress classification processing) locally to determine the target motion data.

In fig. 1, the robot 600 may perform a water pouring task of pouring water in the source container 700 into the destination container 800 in a controlled process, and may be applied to bars, coffee shops, restaurants, etc. instead of manual work. It should be noted that, in fig. 1, the terminal device 400 is illustrated as being independent from the robot 600, and in some embodiments, the terminal device 400 may also be integrated into the robot 600 (i.e., the terminal device 400 is a component inside the robot 600), so that the robot 600 itself has a self-control capability.

In some embodiments, the terminal device 400 or the server 200 may implement the artificial intelligence based robot control method provided by the embodiments of the present application by running a computer program, for example, the computer program may be a native program or a software module in an operating system; can be a local (Native) Application program (APP), i.e. a program that needs to be installed in an operating system to run; or may be an applet, i.e. a program that can be run only by downloading it to the browser environment; but also an applet that can be embedded into any APP. In general, the computer programs described above may be any form of application, module or plug-in.

In some embodiments, the server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a Cloud server providing basic Cloud computing services such as a Cloud service, a Cloud database, Cloud computing, a Cloud function, Cloud storage, a web service, Cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform, where Cloud Technology (Cloud Technology) refers to a hosting Technology for unifying resources of hardware, software, a network, and the like in a wide area network or a local area network to implement computing, storage, processing, and sharing of data. The terminal device 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

Taking the electronic device provided in the embodiment of the present application as an example for illustration, it can be understood that, for the case where the electronic device is a server, parts (such as the user interface, the presentation module, and the input processing module) in the structure shown in fig. 2 may be default. Referring to fig. 2, fig. 2 is a schematic structural diagram of a terminal device 400 provided in an embodiment of the present application, where the terminal device 400 shown in fig. 2 includes: at least one processor 410, memory 450, at least one network interface 420, and a user interface 430. The various components in the terminal 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in fig. 2.

The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.

The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 453 for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices 431 (e.g., display screens, speakers, etc.) associated with user interface 430;

an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.

In some embodiments, the apparatus provided in the embodiments of the present application may be implemented in software, and fig. 2 illustrates an artificial intelligence based robot control apparatus 455 stored in a memory 450, which may be software in the form of programs and plug-ins, and the like, and includes the following software modules: an acquisition module 4551, a progress determination module 4552, a motion planning module 4553, a combination module 4554, and a control module 4555, which are logical and thus can be arbitrarily combined or further split depending on the functions implemented. The functions of the respective modules will be explained below.

The robot control method based on artificial intelligence provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the electronic device provided by the embodiment of the present application.

Referring to fig. 3A, fig. 3A is a schematic flowchart of a robot control method based on artificial intelligence according to an embodiment of the present application, and will be described with reference to the steps shown in fig. 3A.

In step 101, acquiring an image of an environment where the robot is located in the process of executing a task by the robot; the task comprises a plurality of cascaded task levels, and each task level comprises a plurality of candidate task schedules.

In the embodiment of the present application, the type of the task performed by the robot is not limited, and may be a water pouring task or a cleaning task, for example. The task executed by the robot comprises a plurality of cascaded task levels, each task level comprises a plurality of candidate task schedules, and the task levels and the candidate task schedules can be set according to actual application scenes. For example, the water pouring task comprises a phase task level and a state task level, wherein the phase task level comprises 4 candidate task schedules (or phases) which are [ control source container to approach target container ], [ start pouring water ], [ slow down pouring water speed ] and [ end pouring water and remove source container ]; the candidate task progress included by the status task hierarchy represents a status where the internal space of the target container is filled with water, such as including 10% (i.e., the internal space of the target container has been filled with water 10%), 20%, … … 100%. For another example, the cleaning task includes a stage task level and a state task level, and the stage task level includes 3 candidate task schedules, which are [ control rag to approach the desktop to be cleaned ], [ start cleaning ], and [ end cleaning and remove rag ]; the candidate task progress included in the status task hierarchy represents a currently cleaned region, such as the nth region (taking the example of dividing the desktop into a plurality of regions), where n is an integer greater than 0.

In the process of executing tasks by the robot, acquiring images of the environment where the robot is located, wherein the images can be images obtained by shooting the environment where the robot is located through a camera independent of the robot, namely images obtained by shooting at a third person called a visual angle; the image may be an image obtained by imaging the environment in which the robot is located by a camera inside the robot, that is, an image obtained by imaging the robot from the first-person perspective. The image of the environment in which the robot is located may not include any structure of the robot itself, and may include all or part of the structure of the robot.

In step 102, a plurality of task levels are traversed according to the image, and the task progress of the robot is determined in a plurality of candidate task progresses included in the traversed task levels.

Here, a plurality of task levels included in a task have a set hierarchical order, and taking a task including a stage task level and a state task level as an example, in the hierarchical order, the stage task level is a higher-level task level and the state task level is a lower-level task level. When the image of the environment where the robot is located is acquired, a plurality of task levels are traversed according to the level sequence, for example, the first traversed task level is a stage task level, and the second traversed task level is a state task level.

And for the traversed task level, determining the task progress of the robot according to the acquired image in a plurality of candidate task progresses. Here, the task progress in the task hierarchy of the robot may be determined through a progress classification model corresponding to the task hierarchy, which is described in detail later.

In some embodiments, the above-mentioned determining the task progress of the robot among the plurality of candidate task progresses included in the traversed task hierarchy may be implemented in such a manner that: when the task progress in the last traversed task level and at least part of candidate task progress included in the traversed task level have a cascade relation, determining the task progress of the robot in at least part of the candidate task progress according to the image; and when the task progress in the last traversed task level and all candidate task progresses included in the traversed task level have no cascade relation, stopping the traversal, and determining that the task progress in the traversed task level and the task progress in the subsequent task level which is not traversed is empty.

In the embodiment of the present application, one condition is that a cascade relationship exists between a certain candidate task progress in a certain task level and at least part of candidate task progress in a next task level, and the cascade relationship may be preset according to an actual application scenario. Taking the water pouring task as an example, it can be set that candidate task schedules of [ start water pouring ] in the stage task hierarchy have a cascade relationship with candidate task schedules of 10%, 20%, 30%, 40%, 50% and 60% in the status task hierarchy; candidate task schedules of [ slowing down the speed of water pouring ] in the stage task hierarchy are in cascade relation with 70%, 80%, 90% and 100% of these candidate task schedules in the status task hierarchy.

In another case, a candidate task progress in a certain task level and all candidate task progresses in a next task level have no cascade relation. By taking a water pouring task as an example, the candidate task progress of a [ control source container is close to a target container ] in a stage task level and all candidate task progress in a state task level do not have a cascade relation; candidate task schedules of [ ending water pouring and removing a source container ] in the stage task hierarchy have no cascade relation with all candidate task schedules in the state task hierarchy.

For the two cases, the task progress of the robot in the traversed task hierarchy is determined in different ways. For example, when the task progress in the last traversed task level has a cascade relationship with at least part of candidate task progress included in the traversed task level, determining the task progress of the robot according to the image in at least part of candidate task progress included in the traversed task level, and continuing to traverse; and when the task progress in the last traversed task level and the candidate task progress included in the traversed task level do not have a cascade relation, stopping the traversal, determining that the task progress in the traversed task level is empty, and simultaneously determining that the task progress in the subsequent task level which is not traversed is empty, namely determining the target motion data without adopting the task progress in the traversed task level and the task progress in the subsequent task level which is not traversed. By the aid of the method, the accuracy and the reasonability of the determined task progress can be improved based on the condition that whether the cascade relation exists between the candidate task progresses of different task levels.

In some embodiments, in traversing a plurality of task levels from an image, further comprising: stopping traversing when the task progress in the traversed task hierarchy corresponds to a single motion data; and controlling the robot according to the motion data corresponding to the task progress in the traversed task hierarchy.

In the embodiment of the application, some candidate task schedules of some task hierarchies may correspond to a set single motion data, and the motion data corresponding to different candidate task schedules may be the same or different. When the task progress in the traversed task hierarchy corresponds to a single motion data, the traversal is stopped, and the robot is controlled directly according to the motion data, i.e. without performing the subsequent steps 103 to 105. For example, a water pouring task is used, a candidate task progress [ a control source container is close to a target container ] in a stage task level can correspond to a set motion data, a candidate task progress [ a water pouring ending and source container moving removing ] can also correspond to a set motion data, and the two motion data can be the same or different and are determined according to an actual application scene. By the aid of the method, the flexibility of robot control can be improved, and control requirements of different candidate task schedules are met.

In step 103, motion planning processing is performed according to the image, and motion data of the robot is obtained.

In addition to determining the task progress of the robot in each task level according to the image, in the embodiment of the present application, the motion planning processing is performed according to the image to obtain the motion data of the robot, where the motion planning processing may be implemented according to a motion planning model, which is specifically described below.

In step 104, target movement data is determined according to the task progress and the movement data in the plurality of task hierarchies.

Here, the motion data is constrained according to the task progress in the plurality of task levels, and target motion data is obtained. For example, a motion data range (which may be preset or obtained in another way) corresponding to a task progress in a last traversed task hierarchy is determined, and when the motion data of the robot is successfully matched with the motion data range, the motion data is used as target motion data; when the matching of the motion data of the robot and the motion data range fails, determining target motion data according to the motion data range, for example, using any motion data in the motion data range as the target motion data. Of course, the manner of determining the target motion data in combination with the task progress and the motion data in the plurality of task hierarchies is not limited thereto.

In step 105, the robot is controlled according to the target motion data.

The data type of the target motion data is not limited in the embodiment of the application, and the data type may be a pose value, an angular velocity value, a moment value or the like. After the target motion data is determined, a control instruction is determined through an inverse kinematics principle, and the robot is controlled according to the control instruction, wherein the control instruction comprises control data (control quantity) which can be directly read and executed by the robot, and the control data is rotation angles of a plurality of joints of the robot. Of course, if the target motion data can be directly read and executed by the robot, the target motion data may be directly used as the control data in the control command.

It should be noted that, steps 101 to 105 may be performed periodically, that is, images of the environment where the robot is located are collected periodically, and target motion data is determined to control the robot, so that real-time and accurate control can be achieved.

As shown in fig. 3A, in the embodiment of the present application, the task progress of the robot is determined step by step according to the image of the environment where the robot is located, the motion planning processing is performed to obtain the motion data, and then the target motion data is determined by combining the task progress and the motion data in a plurality of task levels, so that the accuracy and the interpretability of the determined target motion data are improved, and the success rate of the robot executing the task is also improved.

In some embodiments, referring to fig. 3B, fig. 3B is a flowchart of an artificial intelligence based robot control method provided in an embodiment of the present application, and step 104 shown in fig. 3A may be implemented by steps 201 to 202, which will be described in conjunction with the steps.

In step 201, for the task progress in each task hierarchy, motion data is updated according to the motion data range corresponding to the task progress.

In the embodiment of the application, the task progress in the plurality of task hierarchies may correspond to one motion data range respectively. The motion data range can be set manually or determined in other manners, and the motion data ranges corresponding to the task schedules in different task levels can be the same or different.

After the motion data is obtained by performing motion planning processing according to the image, the motion data can be updated according to the motion data range corresponding to the task progress in each task level, so that the motion data can be restricted in a specific motion data range, namely, safety restriction is realized. In some embodiments, the task progress in the plurality of task levels may be traversed, and the motion data is updated according to the motion data range corresponding to the traversed task progress, where the traversing may be performed according to a level order, or may be performed according to an opposite order of the level order, which is not limited to this.

In some embodiments, the above-mentioned updating process of the motion data according to the motion data range corresponding to the task progress may be implemented in such a manner that: when the motion data is successfully matched with the motion data range corresponding to the task progress, keeping the motion data unchanged; and when the motion data fails to be matched with the motion data range corresponding to the task progress, determining new motion data according to the motion data range corresponding to the task progress.

Here, an example of the update processing is provided, and for convenience of understanding, a process of performing the update processing on the motion data is explained with a motion data range corresponding to a task progress in any one task hierarchy. First, the motion data is matched with a motion data range corresponding to a task progress, where a successful match may mean that the motion data falls in the motion data range, and a failed match may mean that the motion data does not fall in the motion data range, that is, the motion data is greater than the largest motion data in the motion data range or less than the smallest motion data in the motion data range, which, of course, does not constitute a limitation on the embodiment of the present application.

When the motion data is successfully matched with the motion data range, keeping the motion data unchanged; when the matching of the motion data and the motion data range fails, any one of the motion data (such as the maximum motion data or the minimum motion data) in the motion data range is used as new motion data, so that the new motion data can be successfully matched with the motion data range. By the method, the motion data can be effectively restrained in the updating process.

In some embodiments, when the motion data fails to match with the motion data range corresponding to the task progress, determining new motion data according to the motion data range corresponding to the task progress may be implemented in such a manner that: performing at least one of: when the motion data is smaller than the minimum motion data in the motion data range corresponding to the task progress, determining that the motion data fails to be matched with the motion data range corresponding to the task progress, and taking the minimum motion data as new motion data; and when the motion data is larger than the maximum motion data in the motion data range corresponding to the task progress, determining that the motion data fails to be matched with the motion data range corresponding to the task progress, and taking the maximum motion data as new motion data.

In the embodiment of the application, for different task schedules, different constraint modes such as minimum value constraint and maximum value constraint for the motion data can be set. For convenience of understanding, a process of updating the exercise data in the exercise data range corresponding to the task progress in any one task hierarchy will be described

If the constraint mode corresponding to the task progress is minimum constraint, when the motion data is smaller than the minimum motion data in the motion data range corresponding to the task progress, determining that the motion data fails to be matched with the motion data range corresponding to the task progress, and using the minimum motion data as new motion data, namely, the purpose of the minimum constraint is to avoid the motion data being too small.

If the constraint mode corresponding to the task progress is maximum value constraint, when the motion data is larger than the maximum motion data in the motion data range corresponding to the task progress, determining that the motion data fails to be matched with the motion data range corresponding to the task progress, and taking the maximum motion data as new motion data, namely the purpose of the maximum value constraint is to avoid overlarge motion data. Wherein, the larger the motion data is, the larger the control amplitude of the robot is.

For example, in the water pouring task, for the stage [ start pouring water ], the corresponding constraint mode is set to be the minimum value constraint, so that the water pouring speed is prevented from being too slow; and for the stage [ slowing down the water pouring speed ], setting the corresponding constraint mode to be maximum value constraint, thereby preventing the water pouring speed from being too high.

It should be noted that, for the same task progress, the corresponding constraint mode may also be set to include both the minimum value constraint and the maximum value constraint. By means of the method, the flexibility of updating the motion data can be improved, and the corresponding constraint mode can be set in a targeted manner according to the characteristics of different task schedules in an actual application scene.

In some embodiments, when the data types of the motion data ranges respectively corresponding to the task schedules in different task hierarchies are different, before performing update processing on the motion data according to the motion data range corresponding to the task schedule, the method further includes: and when the data type of the motion data fails to be matched with the data type corresponding to the motion data range corresponding to the task progress, performing data type conversion processing on the motion data according to the data type corresponding to the motion data range corresponding to the task progress.

In the embodiment of the present application, the task schedules in the multiple task hierarchies may correspond to motion data ranges for different data types, for example, the task schedules in the stage task hierarchy of the robot correspond to motion data ranges for pose values, such as [ pose value 1, pose value 2 ]; the task progress of the robot in the status task hierarchy corresponds to a range of motion data for angular velocities, e.g., [ angular velocity 1, angular velocity 2 ]. The range of the motion data can be set manually or determined in other ways. Therefore, the constraint of different dimensions can be carried out on the motion data, and the safety of robot control is guaranteed to the greatest extent.

Before updating the motion data according to the motion data range corresponding to the task progress in any task level, firstly judging whether the data type of the motion data is the same as the data type corresponding to the motion data range, and if so, updating the motion data according to the motion data range; if the motion data range is different from the motion data range, performing data type conversion processing on the motion data according to the data type corresponding to the motion data range, and then performing update processing on the motion data after the data type conversion processing according to the motion data range. For example, if the data type of the motion data is an angular velocity and the data type of the motion data range is a pose value, the data type of the motion data is first converted from the angular velocity to the pose value (i.e., data type conversion processing is performed), and then the motion data with the data type of the pose value is updated according to the motion data range.

In step 202, the motion data obtained by performing update processing according to the task progress in the plurality of task levels is used as the target motion data.

For example, when traversing the task schedules in the plurality of task levels is completed, the motion data obtained by sequentially performing update processing according to the task schedules in the plurality of task levels is used as the target motion data.

As shown in fig. 3B, in the embodiment of the application, security constraint is performed on motion data through motion data ranges respectively corresponding to task schedules in a plurality of task levels, so that the security of the determined target motion data can be effectively improved.

In some embodiments, referring to fig. 3C, fig. 3C is a schematic flowchart of a robot control method based on artificial intelligence provided in an embodiment of the present application, and based on fig. 3A, before step 101, in step 301, during the course of the robot performing a historical task, a sample image of an environment where the robot is located, a sample task progress corresponding to the sample image, and sample motion data for controlling the robot at a time corresponding to the sample image are acquired.

In the embodiment of the application, the model can be constructed based on the artificial intelligence principle, so that the task progress and the motion data are determined. First, during the execution of the historical task by the robot, a sample image of the environment where the robot is located, a task progress corresponding to the sample image (named as sample task progress for convenience of distinction), and sample motion data for controlling the robot at a time corresponding to the sample image are collected (which may be periodically collected) to train the model. The historical task is a generic term of one or more tasks executed before the task in step 101, and the historical task is the same type as the task in step 101, such as a water pouring task. The sample task progress can be manually annotated by a user based on the sample image. The sample motion data can be obtained from the sample image, and can also be in communication connection with the robot, so that the sample motion data sent by the robot is obtained.

It should be noted that the sample motion data acquired here may be motion data of an entire structure of the robot, or may be motion data of a partial structure of the robot, such as motion data of an End effector (End-effector), where the End effector refers to a tool that is connected to any joint of the robot and has a certain function, for example, in a water pouring task, the End effector may be connected to a wrist joint of a mechanical arm of the robot to perform a function of holding and rotating the source container, and in other cases, the End effector may be regarded as being integrated with the wrist joint of the mechanical arm of the robot.

In some embodiments, after step 301, further comprising: for each sample task progress collected, the following processing is executed: and constructing a motion data range corresponding to the sample task progress according to a plurality of sample motion data corresponding to the sample task progress, so that in the process of executing the task by the robot, the motion data range corresponding to the sample task progress is combined, and in the process of executing the task by the robot, the target motion data is determined by combining the motion data obtained by motion planning processing.

For each obtained sample task progress, a motion data range corresponding to the sample task progress can be constructed according to the minimum sample motion data and the maximum sample motion data corresponding to the sample task progress, namely the motion data range is [ minimum sample motion data, maximum sample motion data ]. In this way, in the process of executing the task by the robot, if the sample task progress becomes the determined task progress, the motion data can be updated according to the motion data range corresponding to the sample task progress, that is, the target motion data is determined by combining the motion data. By the aid of the method, automatic construction of the motion data range can be achieved, labor cost is saved, and the constructed motion data range can be suitable for corresponding sample task progress.

It is worth noting that for a partial sample task schedule, what is required is not a motion data range, but a single motion data, such as a candidate task schedule in the stage task hierarchy of the task of pouring [ controlling the source container to approach the target container ] and [ ending the pouring and removing the source container ]. For the sample task progress of this type, after the corresponding motion data range is constructed, any one of the motion data (such as the largest motion data or the smallest motion data) in the motion data range may be used as a single motion data corresponding to the sample task progress, thereby implementing automatic setting.

In some embodiments, during the course of the robot performing the historical task, the method further includes: any one of the following processes is performed: acquiring a control instruction aiming at the robot to control the robot; and acquiring motion data of a control object of the robot, and generating a control instruction according to the motion data of the control object so as to control the robot.

In order to ensure the accuracy and effectiveness of the collected sample image and the sample motion data, the embodiment of the application provides two ways to control the robot to execute the historical task. The first mode is a Master-slave control mode, in which the robot corresponds to a Master Controller (Master Controller), for example, the Master Controller may be a Master robot, and when a control instruction of an object (e.g., a user) for the Master Controller is obtained, the robot is controlled according to the control instruction.

The second method is a sensing method, for example, when a control object (such as a control user) of the robot wears a sensor in advance, motion data of the control object monitored by the sensor is acquired, and a corresponding control instruction is generated according to the motion data to control the robot. The motion data of the manipulation object can also be used as sample motion data of the robot at the corresponding moment.

Of course, this does not constitute a limitation to the embodiments of the present application, and for example, the robot may also be controlled by dragging the teaching, that is, the robot is directly dragged by the manipulation object of the robot to perform the historical task. By the aid of the mode, flexibility of controlling the robot to execute historical tasks can be improved.

In step 302, training a progress classification model corresponding to a task level where the sample task progress is located according to the sample image and the sample task progress; and the progress classification models corresponding to different task levels are different.

And training a progress classification model corresponding to the task level of the sample task progress according to the acquired sample image and the corresponding sample task progress. Because the collected sample images usually comprise a plurality of sample task schedules, and the corresponding sample task schedules also relate to each task level, the progress classification model corresponding to each task level can be effectively trained. The type of the progress classification model is not limited, and may be, for example, a neural network model.

In the embodiment of the application, the progress classification model can be trained through a back propagation mechanism. For a certain sample image and a corresponding sample task progress, firstly, according to a progress classification model corresponding to a task level where the sample task progress is located, progress classification processing is carried out on the sample image to obtain the task progress of the robot in the task level, and for convenience of distinguishing, the obtained task progress is named as the task progress to be compared. Then, according to the loss function of the progress classification model, the sample task progress and the task progress to be compared are processed to obtain a loss value (also referred to as a difference between the sample task progress and the task progress to be compared), where the type of the loss function is not limited, and may be, for example, a cross entropy loss function. And carrying out backward propagation in the progress classification model according to the obtained loss value, and updating the weight parameter of the progress classification model along the gradient descending direction in the process of backward propagation.

In step 303, a motion planning model is trained based on the sample images and the sample motion data.

Similarly, the motion planning model can be trained according to the collected sample images and the corresponding sample motion data by combining a reverse propagation mechanism.

It is worth mentioning that part of the sample task schedules of a task may correspond to a single motion data (i.e. a motion data for directly controlling the robot), so that the sample images corresponding to this type of sample task schedule may not participate in the training of the motion planning model.

In some embodiments, the progress classification model and the movement planning model corresponding to each task level include a shared feature extraction network, and the movement planning model includes all networks in the progress classification model corresponding to the last task level.

Here, an example architecture of the progress classification model and the movement planning model corresponding to each task level is provided, that is, the progress classification model and the movement planning model corresponding to each task level include a shared feature extraction network. The architecture of the feature extraction network is not limited in the embodiments of the present application, and may include, for example, several convolutional layers that are cascaded, and may also include, for example, several residual blocks that are cascaded. Furthermore, the movement planning model comprises all networks in the progress classification model corresponding to the last task level, i.e. the shared network between the movement planning model and the progress classification model corresponding to the last task level is all networks in the latter. Based on the framework, the relevance between different progress classification models and between the progress classification models and the motion planning models can be enhanced, and the model training effect is improved.

In some embodiments, the training of the progress classification model corresponding to the task level where the sample task progress is located according to the sample image and the sample task progress can be implemented in such a manner that: when the task level of the sample task progress is a first task level, training a feature extraction network in a progress classification model corresponding to the first task level and networks except the feature extraction network according to the sample image and the sample task progress; and when the task level of the sample task progress is any task level except the first task level, training a network except the feature extraction network in a progress classification model corresponding to any task level according to the sample image and the sample task progress.

Under the condition that a progress classification model and a motion planning model corresponding to each task level comprise shared feature extraction networks, when the task level where a sample task progress corresponding to a sample image is located is a first task level, training all networks (including the feature extraction networks and the networks except the feature extraction networks) in the progress classification model corresponding to the first task level according to the sample image and the sample task progress; when the task level of the sample task progress corresponding to the sample image is any task level except the first task level (named as a task level A for convenience of distinguishing), training networks except the feature extraction network in the progress classification model corresponding to the task level A according to the sample image and the sample task progress.

In order to improve the training effect, the progress classification models corresponding to the task levels may be sequentially trained according to the level sequence, for example, after the training of the progress classification model corresponding to the first task level is completed, the progress classification model corresponding to the second task level is trained, and so on. Through the mode, the progress classification model corresponding to each task level can be effectively trained.

In some embodiments, training the motion planning model based on the sample images and the sample motion data as described above may be implemented in such a way that: and determining a difference network between the motion planning model and the progress classification model corresponding to the last task level, and training the difference network according to the sample image and the sample motion data.

Under the condition that the motion planning model comprises all networks in the progress classification model corresponding to the last task level, for the motion planning model, a difference network (such as a plurality of full connection layers) between the motion planning model and the progress classification model corresponding to the last task level can be determined, and the difference network is trained by the sample images and the corresponding sample motion data. It should be noted that, in order to improve the training effect, the progress classification models corresponding to the task levels may be sequentially trained according to the level sequence, and then the motion planning model may be trained.

In some embodiments, before training the feature extraction network and the networks other than the feature extraction network in the progress classification model corresponding to the first task level according to the sample image and the sample task progress, the method further includes: in the process of executing the historical task by the imitation object, acquiring a sample image of the environment where the imitation object is located and a corresponding sample task progress; training a feature extraction network in a progress classification model corresponding to the simulated object and networks except the feature extraction network according to the sample image of the simulated object and the corresponding sample task progress; and initializing the shared feature extraction network in the progress classification model and the motion planning model corresponding to the robot according to the trained feature extraction network corresponding to the simulation object.

In the embodiment of the present application, a specific simulation object may be simulated, where the simulation object may be a user, or may be another robot, such as a robot manually controlled by a user, and the manual control mode may be any one of a master-slave control mode, a sensing mode, and a drag teaching mode. In the process of executing the historical task by the imitation object, a sample image of an environment where the imitation object is located and a corresponding sample task progress are acquired, wherein the type of the historical task executed by the imitation object is the same as that of the historical task executed by the robot in step 301, for example, the historical task executed by the imitation object is a water pouring task, and the sample task progress corresponding to the sample image of the imitation object can also be obtained through artificial marking.

In order to enhance the simulation effect, the sample image of the environment where the simulation object is located may be set to have the same shooting angle and shooting range as the sample image of the environment where the robot is located, for example, the sample image of the environment where the robot is located includes an end effector (a wrist joint) of the robot, and the sample image of the environment where the simulation object (taking the user as an example) is located may include a wrist of the user.

And training a feature extraction network in a progress classification model corresponding to the simulation object and a network except the feature extraction network according to the collected sample image of the simulation object and the corresponding sample task progress. The progress classification model corresponding to the imitation object referred to herein may be a progress classification model corresponding to a first task level specifically set for the imitation object, and may have the same structure as the progress classification model corresponding to the first task level set for the robot, in which case, the sample task progress corresponding to the sample image of the imitation object is also the task progress in the first task level.

After the training is finished, initializing the shared feature extraction network in the progress classification model corresponding to the robot and the motion planning model according to the trained feature extraction network corresponding to the simulation object, and then training the initialized feature extraction network according to the sample image of the environment where the robot is located in the process of executing the historical task and the corresponding sample task progress. By the aid of the method, specific simulation objects (such as users) can be effectively simulated, effective initialization of the feature extraction network corresponding to the robot is achieved, follow-up training effects of the progress classification model and the motion planning model corresponding to the robot are improved, and convergence speed of the models is accelerated.

In fig. 3C, the step 102 shown in fig. 3A may be updated to step 304, and in step 304, a plurality of task levels are traversed according to the image, and progress classification processing is performed on the image according to the trained progress classification model corresponding to the traversed task level, so as to obtain the task progress of the robot in the traversed task level.

In the process of executing the task by the robot, a plurality of task levels can be traversed according to the image acquired in real time, and the progress classification processing is performed on the image according to the trained progress classification model corresponding to the traversed task levels to obtain the task progress of the robot in the traversed task levels.

In fig. 3C, step 103 shown in fig. 3A may be updated to step 305, and in step 305, the motion planning process is performed on the image according to the trained motion planning model to obtain the motion data.

Similarly, the motion planning processing can be performed on the images acquired in real time according to the trained motion planning model, so as to obtain the motion data.

As shown in fig. 3C, the embodiment of the application can effectively train multiple progress classification models and motion planning models corresponding to the robot, so as to improve the robustness of robot control according to the trained models.

In the following, an exemplary application of the embodiment of the present application in a practical application scenario will be described, and for convenience of understanding, the scenario in which the robot performs a task of pouring water is described as an example, but this does not constitute a limitation to the embodiment of the present application, and the robot may also perform other tasks, such as tasks in human daily life, such as cooking, cleaning, nursing, or cleaning, and of course, tasks in the industrial field may also be performed.

The embodiment of the present application provides a schematic diagram of an artificial intelligence based robot control method as shown in fig. 4, which includes 3 steps of data acquisition, model training based on multilayer simulation learning, and real-time robot control with safety constraints, which are described below.

1) And (6) data acquisition.

Here, sample data of the robot in a historical water pouring task (corresponding to the historical task above) is collected, and in order to ensure the accuracy of the collected sample data, the robot may be controlled by a user to perform the historical water pouring task, wherein the sample data may include a sample image, a sample label corresponding to the sample image (corresponding to the progress of the sample task above, which may be labeled by the user), and sample motion data for controlling the robot at a time corresponding to the sample image.

In the embodiment of the application, a user can control the robot through a teleoperation mode (master-slave control mode), a wearable sensor mode or a dragging teaching mode, and in the teleoperation mode, a control instruction for a main controller corresponding to the robot by the user is acquired so as to control the robot according to the control instruction; in the wearable sensor mode, acquiring motion data of a user monitored by a sensor (a sensor worn by the user), and generating a corresponding control instruction to control the robot; in the dragging teaching mode, a user can directly drag the robot to realize the control of the robot.

The embodiment of the present application provides an architectural diagram of data acquisition as shown in fig. 5, which shows a robot 51, a camera 52 erected on a high platform, and a terminal device 53 (fig. 5 takes a computer as an example), where there are communication connections between the robot 51 and the terminal device 53, and between the camera 52 and the terminal device 53. Taking an example in the teleoperation mode, the terminal device 53 collects a control instruction (i.e., movement data of the user's wrist) from the user to the main controller (not shown in fig. 5), and controls the robot 51 according to the control instruction, so that the robot 51 performs the same motion of rotating the source container 54 along with the user's wrist, i.e., controls the source container 54 to pour water into the target container 55. In the process of pouring water, the camera 52 collects a plurality of sample images, and sends the plurality of sample images to the terminal device 53, and the terminal device 53 may determine sample motion data of the robot 51 at a corresponding time based on the sample images (of course, the sample motion data may also be directly determined by the camera 52, provided that the camera 52 has a corresponding function), wherein the type of the camera 52 is not limited, for example, the camera may be an RGB-D depth camera, and in addition, the sample motion data may also be determined in other manners, and is not limited to the manner of determining according to images herein. It should be noted that the sample motion data may include motion data of all joints of the robot 51, and may also include motion data of a certain joint (e.g., a wrist joint) of the robot 51, which will be described later for ease of understanding.

As an example, the embodiment of the present application further provides an architecture schematic diagram of data acquisition as shown in fig. 6, where fig. 6 illustrates, by taking a teleoperation mode as an example, a camera, a main controller, a terminal device for data acquisition, a terminal device for robot Control, and a robot, where the camera and the terminal device for data acquisition may be connected through a USB, and the terminal device for data acquisition and the terminal device for robot Control may establish a connection through a Transmission Control Protocol (TCP), which, of course, does not constitute a limitation to the embodiment of the present application. It should be noted that, in fig. 6, the terminal device for data acquisition and the terminal device for robot control are different terminal devices, but in an actual application scenario, the same terminal device may support both the data acquisition function and the robot control function.

2) Model training based on multi-layer mimic learning.

The embodiment of the application provides a multi-layer simulation learning framework, wherein the first layer is a stage task layer, and the meaning of human activities is mainly inferred through a corresponding perception model (such as a neural network model), namely the current stage is estimated. For the water pouring task, the water pouring task can be divided into 4 candidate stages (corresponding to candidate task schedules in the stage task hierarchy), including [ control source container approaching target container ], [ start pouring ], [ slow down pouring speed ], and [ end pouring and source container removal ], and the stage labels respectively representing the 4 candidate stages can be set according to the actual application scenario, for example, the stage labels of the 4 candidate stages are 1, 2, 3 and 4 in sequence.

The second level is a stateful task level, which estimates the state of water dumping, i.e., the state in which the inner space of the target container is filled with water (i.e., the task progress in the stateful task level), mainly through a corresponding neural network model. Here, a plurality of candidate states and a state label corresponding to each candidate state may be set in advance, for example, state label 1 corresponds to a candidate state in which the internal space of the target container has been filled with water by 10%, state label 2 corresponds to a candidate state in which the internal space of the target container has been filled with water by 20%, and … … state label 10 corresponds to a candidate state in which the internal space of the target container has been filled with water by 100%. Wherein, the first hierarchy and the second hierarchy correspond to the above cascaded task hierarchies.

The third level is a motion data level, motion data (such as motion data of a wrist joint of the robot) of the robot is estimated mainly through a corresponding neural network model, and then a control instruction for the robot is determined through an inverse kinematics principle so as to control the robot. Here, only three levels are divided as an example for explanation, in an actual application scenario, more levels (here, more task levels) can be divided according to different tasks, so that the learning of the subtasks is more effective, and meanwhile, the universality of the trained model is enhanced.

After the sample data is obtained through data acquisition, the neural network models respectively corresponding to the three levels can be trained based on the sample data. The trained neural network model corresponding to the first level and the trained neural network model corresponding to the second level can be suitable for various robot platforms, the third level is related to the data type (the data type can be a pose value, an angular velocity value or a moment value and the like) of motion data, and the data types of the motion data used for controlling the robot in different robot platforms are possibly different, so that the trained neural network model corresponding to the third level is suitable for a specific robot platform, namely the robot platform used for data acquisition.

In this embodiment of the application, the neural network model corresponding to each of the three levels may include a feature extraction network and a plurality of Fully Connected (FC) layers, where the feature extraction network may be shared by the neural network models corresponding to the three levels respectively. As an example, the embodiment of the present application provides a schematic diagram of a neural network model corresponding to three levels as shown in fig. 7, and as shown in fig. 7, the feature extraction network may include a plurality of Residual blocks (Residual blocks), the number of Convolutional layers (volumetric Layer) in each Residual Block may be flexibly adjusted, and Batch Normalization (BN) and activation processing implemented based on a Linear rectification function (Rectified Linear Unit, ReLU) are also performed in each Residual Block. Based on the residual block, the neural network model can be trained more deeply, accurately and effectively, and the prediction precision based on the trained neural network model is improved. In an actual application scenario, the number of residual blocks in the feature extraction network can be increased or decreased appropriately according to the complexity of the task, and the number of full connection layers can also be increased or decreased appropriately.

In fig. 7, the first level corresponds to a neural network model (i.e., a stage task level corresponds to a progress classification model, hereinafter designated as F)₁Model) includes a feature extraction network and fully connected layers 1 to K, a neural network model corresponding to the second level (i.e., a progress classification model corresponding to the status task level, hereinafter designated as F)₂Model) comprises a feature extraction network and fully connected layers 1 'to M', a neural network model (i.e. a motion planning model, hereinafter designated as F) corresponding to the third level₃Model) includes a feature extraction network, fully connected layers 1' to M ', and fully connected layers 1 "to L", where K, M ' and L "are both integers greater than 0. Wherein, F₁Model and F₂The models are classification models and can pass through SoActivating the ftMax activation function, and outputting a label; f₃The model is a regression model, and activation processing may be performed by a Linear (Linear) activation function, thereby outputting motion data.

With D_TThe method comprises the steps of representing a set formed by a plurality of sample data obtained through data acquisition, wherein each sample data comprises a sample image, a sample label corresponding to the sample image and sample motion data of the sample image at the corresponding moment. The sample labels can be labeled by a user, and include a sample stage label at a first level and a sample state label at a second level, where for the sample stage label c, c is 1 to indicate that [ the control source container is close to the target container []Sample phase (corresponding to the above sample task progress), c ═ 2 denotes [ start pouring water ]]C-3 denotes [ slow down the speed of pouring water ]]C-4 denotes [ end of pouring and removal of source container [ ]]C represents the number of divided candidate stages (here, 4); for the sample state label S, S ═ 1 indicates a sample state in which the internal space of the target container has been filled with water by 10%, S ═ 2 indicates a sample state in which the internal space of the target container has been filled with water by 20%, and so on, and S indicates the number of divided candidate states.

For F₁For the model, the training process can be regarded as supervised classification training, the training set O_pComprising D_TAll sample images in (1) and a sample stage label corresponding to each sample image; for F₂For the model, the training process can be regarded as supervised classification training as well, and the training set O_sComprising D_TMiddle accord with (begin to pour water)]And [ slow down the speed of pouring water ]]All sample images of the stage (i.e. all sample images satisfying c-2 and c-3) and a sample state label corresponding to each sample image; for F₃For the model, the training process can be regarded as supervised regression training, training set O_aComprising D_TMiddle accord with (begin to pour water)]And [ slow down the speed of pouring water ]]All sample images of the phase of (a), and sample motion data corresponding to each sample image, of course, the training set O_aOr can also be usedTo include D_TAnd the sample motion data corresponding to each sample image.

For ease of understanding, the model training process based on multi-layer mock learning is illustrated in step form. For F₁The model is trained as follows:

firstly, initialize F₁Weight parameter of model θ'_aHere, weight parameter θ'_aThe method comprises the steps of weight parameters of a feature extraction network and weight parameters of all connection layers from 1 to K;

② from the training set O according to the Batch Size (Batch Size)_pMiddle sampling, for example, taking Batch Size 20 as an example, one time from the training set O_pSampling 20 sample images and a sample stage label corresponding to each sample image;

thirdly, calculating the loss value L corresponding to the sampled data_a；

Fourthly, according to the obtained loss value L_aUpdating the weight parameter θ 'in conjunction with a gradient descent mechanism'_aI.e. execute

And θ'_a＝θ_aWherein, alpha represents the learning rate,

is the sign of the gradient. Wherein, the steps from two to four can be executed repeatedly until F is satisfied₁The convergence condition of the model.

In training F₁While the model is being run from D_TThe pose value range [ p ] of the robot wrist joint corresponding to each sample stage is obtained_min(c),p_max(c)](c＝0,1,...,C)。

For F₂The model is trained as follows:

firstly, initialize F₂Weight parameter of model θ'_bHere, weight parameter θ'_bA weight parameter including full connection layers 1 'to M';

② from the training set O according to the Batch Size (Batch Size)_sMiddle sampling, for example, taking Batch Size 20 as an example, one time from the training set O_sSampling 20 sample images and a sample state label corresponding to each sample image;

thirdly, calculating the loss value L corresponding to the sampled data_b；

Fourthly, according to the obtained loss value L_bUpdating the weight parameter θ 'in conjunction with a gradient descent mechanism'_bI.e. execute

And θ'_b＝θ_b. Wherein, the steps from two to four can be executed repeatedly until F is satisfied₂The convergence condition of the model.

In training F₂While the model is being run from D_TThe angular velocity range [ v ] of the rotation of the wrist joint of the robot corresponding to each sample state is obtained_min(s),v_max(s)](s＝0,1,...,S)。

For F₃The model is trained as follows:

firstly, initialize F₃Weight parameter of model θ'_cHere, weight parameter θ'_cWeight parameters including full connection layers 1 "to L";

② from the training set O according to the Batch Size (Batch Size)_aMiddle sampling, for example, taking Batch Size 20 as an example, one time from the training set O_aSampling 20 sample images and sample motion data corresponding to each sample image (taking the angular speed of the rotation of the wrist joint of the robot as an example);

thirdly, calculating the loss value L corresponding to the sampled data_c；

Fourthly, according to the obtained loss value L_cUpdating the weight parameter θ 'in conjunction with a gradient descent mechanism'_cI.e. execute

And θ'_c＝θ_c. Wherein, the steps from two to four can be executed repeatedly until F is satisfied₃The convergence condition of the model.

Training F on line₁Model, F₂Model and F₃After the model is used, the model can be applied to the online robot real-time control to execute the intelligent water pouring task of the robot. The embodiment of the present application provides an architecture diagram of the robot online real-time control shown in fig. 8, which illustrates a camera, a terminal device for processing a real-time image (by means of a trained neural network model corresponding to each of three levels), a terminal device for robot control, and a robot.

3) And the robot with safety constraint is controlled in real time.

Here, the predicted pose value of the robot wrist joint at time t is represented by p (t), the predicted angular velocity of the robot wrist joint rotation at time t is represented by v (t), and the real-time robot control process is described in the form of pseudo codes as follows:

initializing p (t) ═ 0, t ═ 0;

when the water pouring task is not finished, executing

t is t + δ t; wherein δ t represents the reciprocal of the control frequency of the robot, namely the time interval between two adjacent times of sending the control command to the robot;

c＝F₁(o_t) (ii) a Wherein o is_tRepresenting the image collected by the camera at the time t;

when c is 1, that is, a stage determined from the plurality of candidate stages is [ control source container approaches target container ], a quick call is performed

p(t)＝p(t-1)+v₀δ t; wherein v is₀To control the approach of a source container to a target container]The single motion data corresponding to the stage(s) can be specifically set according to the actual application scene; }

Performing a chinese gesture when c is 2, i.e., the stage determined from the plurality of candidate stages is [ start pouring ]

s＝F₂(o_t)；v(t)'＝F₃(o_t)；

When v (t)' < v_min(s), performing

v(t)＝v_min(s)；}

When v (t)' > v_min(s), performing

v(t)＝v(t)'；}

p(t)'＝p(t-1)+v(t)δt；

When p (t)' < p_min(c) Then, execute-

p(t)＝p_min(c)；}

When p (t)' > p_min(c) Then, execute-

p(t)＝p(t)'；}}

When c is 3, i.e. the stage determined from the plurality of candidate stages is [ slow down the speed of pouring water ], performing the

s＝F₂(o_t)；v(t)'＝F₃(o_t)；

When v (t)' > v_max(s), performing

v(t)＝v_max(s)；}

When v (t)' ≦ v_max(s), performing

v(t)＝v(t)'；}

p(t)'＝p(t-1)+v(t)δt；

When p (t)' > p_max(c) Then, execute-

p(t)＝p_max(c)；}

When p (t) < p_max(c) Then, execute-

p(t)＝p(t)'；}}

When c is 4, that is, the stage determined from the plurality of candidate stages is the stage of [ finishing the water pouring and removing the source container ], the chinese style is performed

p(t)＝p(t-1)+v₁δ t; wherein v is₁To [ end pouring water and remove source container ]]The single motion data corresponding to the stage(s) can be specifically set according to the actual application scene; }}

The final output p (t) of the pseudo code is target motion data, and p (t) can be processed through an inverse kinematics principle to obtain the rotation angles of a plurality of joints in the robot, so that corresponding control instructions (including the rotation angles of the plurality of joints) are sent to the robot to control the rotation angle of the motor of the robot. Further, in the above pseudo code, F₃The motion data output by the model is the robot wrist joint rotationDynamic predicted angular velocity is an example, of course, F₃The motion data output by the model may also have other data types, depending on F₃Training set of models O_aIs the type of data.

It should be noted that the control method of the robot is not limited in the embodiments of the present application, and for example, modes of speed (such as angular speed) control, attitude (such as pose value) control, or moment value control may be applied. The control data in the finally generated control instruction can be modified adaptively according to the specific robot platform used.

The embodiment of the application has at least the following technical effects: 1) a multi-layer simulation learning framework is provided, so that the interpretability of the model can be improved, namely, the determined task progress (such as stage and state) can be output to enable a human to understand the behavior of the robot (explain the reason for outputting target motion data), and if an unexpected fault occurs in the robot control process, the reason for failure can be traced and explained, so that the model can be further improved through a clear direction, and the black box effect of the traditional neural network is relieved; 2) the dependency of a multi-layer simulation learning framework on data is not strong, the common knowledge learned by a first level and a second level is convenient to transfer, and a large amount of data does not need to be collected again when different tasks are performed in learning; 3) complex fluid model modeling can be avoided, and excessive model assumptions and constraints are not required; 4) the application scenarios are wide, that is, the method can be widely applied to various target-driven tasks, including but not limited to water pouring, cooking, cleaning, nursing or cleaning, taking a cleaning task as an example, a tabletop to be cleaned can be divided into a plurality of areas, each area corresponds to a state, and the final target is to clean the tabletop.

Continuing with the exemplary structure of the artificial intelligence based robot control device 455 provided by the embodiments of the present application implemented as software modules, in some embodiments, as shown in fig. 2, the software modules stored in the artificial intelligence based robot control device 455 of the memory 450 may include: the acquisition module 4551 is used for acquiring images of the environment where the robot is located in the process of executing tasks by the robot; the task comprises a plurality of cascaded task levels, and each task level comprises a plurality of candidate task schedules; the progress determining module 4552 is configured to traverse a plurality of task levels according to the image, and determine a task progress of the robot from a plurality of candidate task progresses included in the traversed task level; a motion planning module 4553, configured to perform motion planning processing according to the image to obtain motion data of the robot; a combination module 4554, configured to determine target motion data according to task progress and motion data in multiple task hierarchies; and a control module 4555 for controlling the robot according to the target motion data.

In some embodiments, the progress determination module 4552 is further configured to: according to the progress classification model corresponding to the traversed task level, performing progress classification processing on the image to obtain the task progress of the robot in the traversed task level; and the progress classification models corresponding to different task levels are different.

In some embodiments, the motion planning module 4553 is further configured to: and performing motion planning processing on the image according to the motion planning model to obtain motion data of the robot.

In some embodiments, artificial intelligence based robot control 455 further comprises: the robot sample acquisition module is used for acquiring a sample image of the environment where the robot is located, a sample task progress corresponding to the sample image and sample motion data for controlling the robot at the moment corresponding to the sample image in the process of executing the historical task by the robot; the first training module is used for training a progress classification model corresponding to a task level where the sample task progress is located according to the sample image and the sample task progress; and the second training module is used for training the motion planning model according to the sample images and the sample motion data.

In some embodiments, the first training module is further to: when the task level of the sample task progress is a first task level, training a feature extraction network in a progress classification model corresponding to the first task level and networks except the feature extraction network according to the sample image and the sample task progress; and when the task level of the sample task progress is any task level except the first task level, training a network except the feature extraction network in a progress classification model corresponding to any task level according to the sample image and the sample task progress.

In some embodiments, the second training module is further to: and determining a difference network between the motion planning model and the progress classification model corresponding to the last task level, and training the difference network according to the sample image and the sample motion data.

In some embodiments, artificial intelligence based robot control 455 further comprises: the simulation object sample acquisition module is used for acquiring a sample image of the environment where the simulation object is located and a corresponding sample task progress in the process of executing the historical task by the simulation object; the third training module is used for training a feature extraction network in a progress classification model corresponding to the simulated object and networks except the feature extraction network according to the sample image of the simulated object and the corresponding sample task progress; and the initialization module is used for initializing the shared feature extraction network in the progress classification model and the motion planning model corresponding to the robot according to the trained feature extraction network corresponding to the simulation object.

In some embodiments, artificial intelligence based robot control 455 further comprises: the range building module is used for executing the following processing aiming at the progress of each collected sample task: and constructing a motion data range corresponding to the sample task progress according to a plurality of sample motion data corresponding to the sample task progress, so as to determine target motion data by combining motion data obtained by motion planning processing in the process of executing the task by the robot.

In some embodiments, artificial intelligence based robot control 455 further comprises: the teaching control module is used for executing any one of the following processes in the process that the robot executes the historical task: acquiring a control instruction aiming at the robot to control the robot; and acquiring motion data of a control object of the robot, and generating a control instruction according to the motion data of the control object so as to control the robot.

In some embodiments, the binding module 4554 is further configured to: when the task progress of each task level corresponds to one motion data range, updating the motion data according to the motion data range corresponding to the task progress aiming at the task progress in each task level; and taking the motion data obtained after updating according to the task progress in the plurality of task levels as target motion data.

In some embodiments, the binding module 4554 is further configured to: when the motion data is successfully matched with the motion data range corresponding to the task progress, keeping the motion data unchanged; and when the motion data fails to be matched with the motion data range corresponding to the task progress, determining new motion data according to the motion data range corresponding to the task progress.

In some embodiments, the binding module 4554 is further configured to: performing at least one of: when the motion data is smaller than the minimum motion data in the motion data range corresponding to the task progress, determining that the motion data fails to be matched with the motion data range corresponding to the task progress, and taking the minimum motion data as new motion data; and when the motion data is larger than the maximum motion data in the motion data range corresponding to the task progress, determining that the motion data fails to be matched with the motion data range corresponding to the task progress, and taking the maximum motion data as new motion data.

In some embodiments, when the data types of the motion data ranges respectively corresponding to the task schedules in different task hierarchies are different, the combination module 4554 is further configured to: and when the data type of the motion data fails to be matched with the data type corresponding to the motion data range corresponding to the task progress, performing data type conversion processing on the motion data according to the data type corresponding to the motion data range corresponding to the task progress.

In some embodiments, the progress determination module 4552 is further configured to: when the task progress in the last traversed task level and at least part of candidate task progress included in the traversed task level have a cascade relation, determining the task progress of the robot in at least part of the candidate task progress according to the image; and when the task progress in the last traversed task level and all candidate task progresses included in the traversed task level have no cascade relation, stopping the traversal, and determining that the task progress in the traversed task level and the task progress in the subsequent task level which is not traversed is empty.

In some embodiments, artificial intelligence based robot control 455 further comprises: the traversal stopping module is used for stopping traversal when the task progress in the traversed task level corresponds to a single motion data; and the direct control module is used for controlling the robot according to the motion data corresponding to the task progress in the traversed task hierarchy.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the artificial intelligence based robot control method according to the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium storing executable instructions, which when executed by a processor, will cause the processor to perform a method provided by embodiments of the present application, for example, an artificial intelligence based robot control method as illustrated in fig. 3A, 3B and 3C.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A robot control method based on artificial intelligence, the method comprising:

and controlling the robot according to the target motion data.

2. The method of claim 1, wherein determining the task progress of the robot among the plurality of candidate task progresses included in the traversed task hierarchy comprises:

according to the progress classification model corresponding to the traversed task level, performing progress classification processing on the image to obtain the task progress of the robot in the traversed task level;

the progress classification models corresponding to different task levels are different;

the motion planning processing according to the image to obtain the motion data of the robot includes:

and performing motion planning processing on the image according to a motion planning model to obtain motion data of the robot.

3. The method of claim 2, further comprising:

in the process that the robot executes the historical task, acquiring a sample image of the environment where the robot is located, a sample task progress corresponding to the sample image and sample motion data for controlling the robot at the moment corresponding to the sample image;

training a progress classification model corresponding to the task level of the sample task progress according to the sample image and the sample task progress;

and training the motion planning model according to the sample images and the sample motion data.

4. The method of claim 3, wherein the progress classification model and the motion planning model for each task level comprise a shared feature extraction network, and the motion planning model comprises all networks in the progress classification model for the last task level.

5. The method according to claim 4, wherein the training of the progress classification model corresponding to the task level at which the sample task progress is located according to the sample image and the sample task progress comprises:

when the task level of the sample task progress is a first task level, training the feature extraction network and networks except the feature extraction network in a progress classification model corresponding to the first task level according to the sample image and the sample task progress;

when the task level of the sample task progress is any task level except the first task level, training a network except the feature extraction network in a progress classification model corresponding to the any task level according to the sample image and the sample task progress;

the training the motion planning model according to the sample images and the sample motion data includes:

and determining a difference network between the motion planning model and the progress classification model corresponding to the last task level, and training the difference network according to the sample image and the sample motion data.

6. The method of claim 4, further comprising:

in the process of executing a historical task by a simulation object, acquiring a sample image of an environment where the simulation object is located and a corresponding sample task progress;

training the feature extraction network and networks except the feature extraction network in a progress classification model corresponding to the imitation object according to the sample image of the imitation object and the corresponding sample task progress;

initializing a progress classification model corresponding to the robot and a shared feature extraction network in the motion planning model according to the trained feature extraction network corresponding to the simulation object.

7. The method of claim 3, wherein after acquiring the sample image of the environment where the robot is located, the progress of the sample task corresponding to the sample image, and the sample motion data for controlling the robot at the time corresponding to the sample image, the method further comprises:

for each collected sample task progress, executing the following processing:

and constructing a motion data range corresponding to the sample task progress according to a plurality of sample motion data corresponding to the sample task progress, so as to determine target motion data by combining motion data obtained by motion planning processing in the process of executing the task by the robot.

8. The method of claim 3, wherein during the execution of the historical tasks by the robot, the method further comprises:

any one of the following processes is performed:

acquiring a control instruction for the robot to control the robot;

and acquiring motion data of a control object of the robot, and generating a control instruction according to the motion data of the control object so as to control the robot.

9. The method according to claim 1, wherein when the task progress of each task hierarchy corresponds to one motion data range, the determining the target motion data according to the task progress in the plurality of task hierarchies and the motion data comprises:

aiming at the task progress in each task level, updating the motion data according to the motion data range corresponding to the task progress;

and taking the motion data obtained after updating according to the task progress in the plurality of task levels as target motion data.

10. The method according to claim 9, wherein the updating the motion data according to the motion data range corresponding to the task progress comprises:

when the motion data is successfully matched with the motion data range corresponding to the task progress, keeping the motion data unchanged;

and when the motion data fails to be matched with the motion data range corresponding to the task progress, determining new motion data according to the motion data range corresponding to the task progress.

11. The method according to claim 10, wherein when the motion data fails to match with the motion data range corresponding to the task progress, determining new motion data according to the motion data range corresponding to the task progress comprises:

performing at least one of:

when the motion data is smaller than the minimum motion data in the motion data range corresponding to the task progress, determining that the motion data fails to be matched with the motion data range corresponding to the task progress, and taking the minimum motion data as new motion data;

and when the motion data is larger than the maximum motion data in the motion data range corresponding to the task progress, determining that the motion data fails to be matched with the motion data range corresponding to the task progress, and taking the maximum motion data as new motion data.

12. The method according to claim 9, wherein when the data types of the motion data ranges respectively corresponding to the task schedules in different task hierarchies are different, before the motion data is updated according to the motion data ranges corresponding to the task schedules, the method further includes:

and when the data type of the motion data fails to be matched with the data type corresponding to the motion data range corresponding to the task progress, performing data type conversion processing on the motion data according to the data type corresponding to the motion data range corresponding to the task progress.

13. The method of claim 1, wherein determining the task progress of the robot among the plurality of candidate task progresses included in the traversed task hierarchy comprises:

when the task progress in the last traversed task level and at least part of candidate task progress included in the traversed task level have a cascade relation, determining the task progress of the robot in the at least part of candidate task progress according to the image;

and stopping traversing when the task progress in the last traversed task level and all candidate task progresses included in the traversed task level have no cascade relation, and determining that the task progress in the traversed task level and the task progress in the subsequent non-traversed task level is empty.

14. The method of claim 1, wherein in traversing the plurality of task levels from the image, the method further comprises:

stopping traversing when the task progress in the traversed task hierarchy corresponds to a single motion data;

and controlling the robot according to the motion data corresponding to the task progress in the traversed task hierarchy.

15. An artificial intelligence based robot control apparatus, characterized in that the apparatus comprises:

16. An electronic device, comprising:

a memory for storing executable instructions;

a processor for implementing the artificial intelligence based robot control method of any one of claims 1 to 14 when executing executable instructions stored in the memory.

17. A computer-readable storage medium storing executable instructions for implementing the artificial intelligence based robot control method of any one of claims 1 to 14 when executed by a processor.