WO2021069129A1

WO2021069129A1 - Device and method for controlling a robot device

Info

Publication number: WO2021069129A1
Application number: PCT/EP2020/072410
Authority: WO
Inventors: Mathias Buerger; Philipp Christian Schillinger; Meng Guo
Original assignee: Robert Bosch Gmbh
Priority date: 2019-10-07
Filing date: 2020-08-10
Publication date: 2021-04-15
Also published as: DE102019216229B4; DE102019216229A1

Abstract

According to one embodiment, a method for controlling a robot device is provided, said method comprising the following steps: a demonstration entity performs an activity; a sequence of positions of a part of the demonstration entity with which the demonstration entity performs the action is recorded; training data is generated on the basis of the recorded sequence of positions of the part of the demonstration entity; a robot control model is trained using the sequence of positions of the part of the demonstration entity as a training sequence of positions for a part of the robot; and the robot device is controlled on the basis of the trained robot control model.

Description

description

Apparatus and method for controlling a robotic device

Various embodiments generally relate to an apparatus and a method for controlling a robotic device.

The teaching of robotic skills from demonstrations is becoming increasingly important and popular in the field of robot manipulation because of its efficiency and intuitiveness. However, such demonstrations typically require direct movement of the robot (e.g. by physical pushing or remote control) and further interaction with the objects in the vicinity of the robot.

The publication "A Tutorial on Task-Parameterized Movement Leaming and Retrieval", by S. Calinon, in Intelligent Service Robotics, 9 (1): 1-29, 2016, describes task-parameterized models that make it possible in particular to be based on a robot to train on such demonstrations.

Moving the robot directly, however, brings with it the limitations that it requires experience to operate a robot directly and poses potential safety risks, especially for laypeople. In addition, even for experienced operators, it is often physically (e.g. mechanically) challenging to move the robot in the correct way to accomplish a given task, especially when the task requires precise movements.

Accordingly, improved approaches to teaching a robot skills are desirable.

The method and apparatus having the features of independent claims 1 and 15 enable users to perform a skill using their own hands to teach the robot the skill (activity). This makes it possible to teach a skill to a robot without additional knowledge or training. This can be done in such a way that it is independent of the dynamic and mechanical limitations of the robot. Teaching the robot is intuitive as the user can behave as they would without the robot. In addition, no additional modeling of the robot is required, since objects are not traced to be manipulated, but rather the hand of a user.

According to one embodiment, a method is provided for controlling a robot device comprising performing an activity by a demonstration entity, recording a sequence of positions of a part of the demonstration entity by means of which the demonstration entity performs the activity, generating training data based on the recorded sequence of positions of the part of the Demonstration authority; Training a robot control model by using the sequence of poses of the part of the demonstrator as a training sequence of poses for a part of the robot and controlling the robot device based on the trained robot control model. The procedure described in this paragraph is a first example.

The recording of the sequence of positions can include the recording of sensor data by one or more sensors and the determination of the sequence of positions from the sensor data. The features mentioned in this paragraph in combination with the first example form a second example.

The recording of the sequence of positions can include the visual recording of the performance of the activity by the demonstrator. The visual recording of activities or skills for training a robot enables a user to easily demonstrate the activity or skill. The features mentioned in this paragraph in combination with any one of the first example to the second example form a third example.

The visual recording can include the recording of at least one sequence of camera images. The provision of a camera for recording camera images creates a simple way of recording training data. Several cameras can also be provided in order to obtain images from different angles. The features mentioned in this paragraph in combination with the third example form a fourth example. The positions can be determined based on an image analysis of the sequence of camera images, in which the position and / or orientation of the part of the presentation instance is tracked in the at least one sequence of camera images. The features mentioned in this paragraph in combination with the fourth example form a fifth example.

Controlling the robot device based on the trained robot control model can include generating a reference trajectory from the trained robot control model and tracking (at least approximately) the reference trajectory by the robot device using a linear-quadratic Gaussian (LQG) controller. The features mentioned in this paragraph in combination with any one of the first example through the fifth example form a sixth example.

The method can include determining a sequence of positions of one or more further elements that are involved in the activity, the training data being generated based on the sequence of positions of the one or more further elements. In particular, this enables a robot to be trained in such a way that the robot can adapt to new scenarios. The features mentioned in this paragraph in combination with any one of the first example to the sixth example form a seventh example.

In the method, a sequence of postures of an object that is manipulated when performing the activity can be determined and the training data can be generated based on the sequence of postures of the object. The features mentioned in this paragraph in combination with the seventh example form an eighth example.

The method may include multiple performance of the activity by the demonstration entity, recording, for each performance of the activity by the demonstration entity, a sequence of positions of the part of the presentation entity and the generation of training data based on the recorded sequences of positions of the part of the presentation entity and training the robot control model by using the sequences of poses of the part of the demonstrator as training sequences of poses for the part of the robot. Repeated implementation enables more robust training and training of a robot in such a way that the robot can adapt to new scenarios. The features mentioned in this paragraph in combination with any one of the first example through the eighth example form a ninth example.

The robot control model can have a statistical model. The features mentioned in this paragraph in combination with any one of the first example through the ninth example form a tenth example.

The method can include determining a sequence of positions of at least one object that is manipulated during the execution of the activity, and the statistical model can be parameterized by means of the determined positions. The features mentioned in this paragraph in combination with the tenth example form an eleventh example.

The statistical model can have or be a task-parameterized Gaussian mixed model. As described below, such models enable effective training of a robot. The features mentioned in this paragraph in combination with any one of the first example to the eleventh example form a twelfth example.

The statistical model can have or be a task-parameterized hidden semi-Markov model. As described below, such a model enables effective training of a robot. The features mentioned in this paragraph in combination with any one of the first example through the twelfth example form a thirteenth example.

When performing the activity, for example, there is no spatial contact between the demonstration entity and the robot device. The features mentioned in this paragraph in combination with any one of the first example through the thirteenth example form a fourteenth example. The kinematics of the demonstration instance and the robot device can differ. The features mentioned in this paragraph in combination with any one of the first example through the fourteenth example form a fifteenth example.

According to an embodiment, there is provided a robot control device configured to perform a method according to any one of the first example to fourteenth examples. The device described in this paragraph constitutes a sixteenth example.

According to one embodiment, a computer program is provided comprising program instructions which, when executed by one or more processors, cause one or more processors to perform a method according to any one of the first example to fifteenth examples.

According to one embodiment, there is provided a computer-readable storage medium having stored thereon program instructions which, when executed by one or more processors, cause one or more processors to perform a method according to any one of the first example to fifteenth examples.

Embodiments of the invention are shown in the figures and are explained in more detail below. In the drawings, like reference characters generally refer to the same parts throughout the several views. The drawings are not necessarily to scale, emphasis instead being placed generally on illustrating the principles of the invention.

Figure 1 shows a robot device arrangement.

FIG. 2 shows a flow chart that illustrates a training method according to various embodiments.

Figure 3 shows an arrangement for recording demonstrations by a user. FIGS. 4A to 4E show representations which illustrate the combination of Gaussian components in local coordinate systems to form a combined Gaussian mixed model

Figure 5 shows a flowchart showing a method for controlling a

Illustrated robotic device according to an embodiment.

The various embodiments, in particular the exemplary embodiments described below, can be implemented by means of one or more circuits. In one embodiment, a “circuit” can be understood as any type of entity implementing logic, which can be hardware, software, firmware, or a combination thereof. Thus, in one embodiment, a “circuit” may be a hardwired logic circuit or a programmable logic circuit such as a programmable processor, for example a microprocessor. A “circuit” can also be software that is implemented or executed by a processor, for example any type of computer program. Any other type of implementation of the respective functions, which are described in more detail below, may be understood as a “circuit” in accordance with an alternative embodiment.

FIG. 1 shows a robot device arrangement 100.

The robot device arrangement 100 includes a robot device 101, for example an industrial robot in the form of a robot arm for moving, assembling or processing a workpiece. The robot device 101 has robot limbs 102, 103, 104 and a base (or generally a bracket) 105 by which the robot limbs 102, 103, 104 are supported. The term "robot limb" refers to the moving parts of the robot device 101, the actuation of which enables a physical interaction with the environment, for example in order to carry out a task. For control purposes, the robot device arrangement 100 contains a controller 106 which is set up to implement the interaction with the environment in accordance with a control program. The last element 104 (seen from the base 105) of the robot limbs 102, 103, 104 is also referred to as the end effector 104 and can be one or more Include tools such as a welding torch, gripping tool, painting machine, or the like.

The other robot limbs 102, 103 (closer to the base 105) can form a positioning device so that, together with the end effector 104, a robot arm (or articulated arm) is provided with the end effector 104 at its end. The robotic arm is a mechanical arm that can perform similar functions as a human arm (possibly with a tool on its end).

The robot device 101 may include connectors 107, 108, 109 that connect the robot limbs 102, 103, 104 to one another and to the base 105. A connector 107, 108, 109 can have one or more joints, each of which can provide rotational movement and / or translational movement (i.e., displacement) for associated robot limbs relative to one another. The movement of the robot limbs 102, 103, 104 can be initiated with the aid of actuators that are controlled by the controller 106.

The term "actuator" can be understood as a component that is capable of influencing a mechanism in response to being driven. The actuator can convert instructions (so-called activation) issued by the controller 106 into mechanical movements. The actuator, for example an electromechanical converter, can be set up to convert electrical energy into mechanical energy in response to its activation.

The term “controller” (also referred to as “control device”) can be understood as any type of logical implementation unit that can contain, for example, a circuit and / or a processor that is able to process software, firmware or a Execute combination of the same, and the instructions, for example to an actuator in the present example, can give. The controller can be set up, for example, by program code (for example software) to control the operation of a system, in the present example a robot. In the present example, the controller 106 includes one or more processors 110 and a memory 111 that stores code and data based on which the processor 110 controls the robotic device 101. According to various embodiments, the controller 106 controls the robot device 101 based on a statistical model 112 stored in the memory 111.

A robot, as implemented by the robotic device assembly 100, can learn from demonstrations to perform a task or to work with a human partner. Human demonstrations can be encoded by a probabilistic model (also known as a statistical model) that represents the target schedule of the task for the robot. The controller 106 can then use the statistical model to generate the desired robot movements, possibly depending on the condition of the human partner and the environment.

In order to teach a robot a skill, such as moving according to a desired trajectory, kinesthetic demonstrations can be performed in which the robot is moved directly, e.g. by physically pushing or using a remote control.

In addition to the experience, safety risks and requirements (e.g. for tasks that require precise movements), moving the robot is also much less intuitive for a human to perform a task compared to using one's own hands.

Alternative approaches track objects in the working area of the robot in order to achieve the desired effects to be produced on the objects. However, this requires additional modeling of the robot in order to combine the desired effects with movements that must be carried out by the robot.

With regard to the above, according to various exemplary embodiments, an approach is provided that enables a human user to teach an activity (skill) to a robot by simply performing the activity itself. Demonstrations are carried out, for example, by following the hand of the User (and optionally the objects involved) instead of recording the trajectory of the end-effect gate. Demonstrations are then used to learn a compact mathematical representation of the skill that can be used (e.g. by the controller 106) to demonstrate that skill by the robot in new scenarios (e.g., new relative positions between the robot and the object to be manipulated ) to reproduce.

Various exemplary embodiments are based on technical advances in two areas: firstly, the tracking of a hand based on camera images is typically available in areas where robots are used, for example in factories, and secondly, methods for training robots based on human demonstrations are both efficient Learning by the robot (ie training the robot) as well as flexible reproduction. An example of this are TP-HSMMs (Task-Parameterized Hidden Semi Markov Models), which enable a task parameter-dependent representation of learned movement skills.

Tracking objects and human hands is an active research area (especially in machine vision) and is of great importance for industrial applications. In contrast to the application of corresponding techniques to human-machine interaction (such as for video games, for example), it is used according to various embodiments for training (teaching) and learning of robots.

FIG. 2 shows a flow diagram 200 that illustrates a training method according to various embodiments.

In 201, the demonstration phase, a user (or generally a demonstration instance) demonstrates the desired skill. The demonstration is recorded. For example, a video recording is created by means of a camera and the sequence of the positions of a hand of the user (generally a part of the demonstration instance) is determined from the images of the video and represented in the form of a trajectory. This is repeated for several demonstrations. It should be noted that this is decoupled on Kind of happening, that is, for example, using a lot of videos that were previously recorded with no intention of teaching a skill to a robot.

In 202, the learning or training phase, a mathematical model is learned from the collected demonstrations. For example, a TP-HSMM is learned that contains a hand pose as one of the task parameters. A "pose" contains, for example, information about position and / or orientation or also about status (e.g. "hand closed" versus "hand open").

In 203, in the reproduction or execution phase, the learned mathematical model is used to control the robot in such a way that it executes the skill within a demonstrated scenario or also a non-demonstrated scenario. For example, the controller 106 generates a trajectory that is suitable for the movement behavior of the robot, according to which it controls the robot, the position of the end effector 104 assuming the role of the hand pose task parameter. The hand (generally the part of the demonstration instance) is thus clearly mapped onto the end effector (generally the part of the robot device), or the two parts (within the framework of the robot control) are identified with one another and the part of the robot device imitates the part of the demonstration instance . This differs, for example, from training a robot by moving it directly (e.g. pushing and pulling) or using a remote control. There the part of the robot device does not imitate a part of the demonstration instance.

The trajectory for controlling the robot (also referred to as a reference trajectory) can be generated in various ways.

For example, a possible trajectory for the robot is generated by taking into account the robot's movement behavior in the trajectory planning, or instead a trajectory that is partially possible for the robot is generated and the robot is controlled in such a way that it follows the trajectory as physically as possible.

FIG. 3 shows an arrangement 300 for recording demonstrations by a user. A user 301 demonstrates a skill by moving his hand 302. For example, he takes an object 304 from a first position 305 and moves it to a second position 306. A camera 307 records the movement of the user. Several cameras can also be provided which record the demonstration from different angles, in particular from the perspective of the start position 305 and from the perspective of the end position 306 of the object 304.

Each demonstration is thus represented as a sequence of images which is fed to a control device 308, which corresponds to the controller 106, for example. Based on the demonstrations, the control device 308 learns a statistical model 309 that corresponds to the statistical model 112, for example.

For a desired activity (for example to manipulate an object), the user 301 can carry out a demonstration according to various alternatives.

For example, as a first alternative, the user 301 can proceed as follows: a) Preconfiguring the recording system (e.g. having the one or more cameras 307 and the control device 308) so that there are one or more objects 304 that are of interest to the skill and the hand 302 (or both hands of the user 301) can follow. b) The user demonstrates how the activity can be carried out. The control device records the associated trajectory of the hand 302 (or hands) and the one or more objects 304 and stores them. c) The user repeats the demonstration from b) but with different system configurations, for example different starting positions of his hands, different starting positions 305 of the object 304 or different end positions 306 of the object 304.

As a second alternative, the user 301 can proceed, for example, as follows: a) The user 301 records raw output data of the recording system when he carries out the activity, for example a video recording, which is obtained from one or more cameras 307, for example RGBD camera sensors. b) The associated trajectory of the hand (or hands) 302 of the user is then determined by means of abstraction from the recorded data. c) The user repeats the demonstration and determination of an associated trajectory of a) and b) but with different system configurations, for example different start positions of his hands, different start positions 305 of the object 304 or different end positions 306 of the object 304.

It should be noted that the first alternative has higher demands on the real-time tracking performance of the recording system.

If the skill to be learned by the robot is, for example, to pick up the object 304 and place it at the end position 306, the picking system must track the hand of the user 302 together with the object 304 and possible end positions of the object 304 (for example of pallets).

It is assumed that at the end of 201 the demonstrations are abstracted (for example are represented as courses of coordinates of the hand 302 or the object 304) and as trajectories (for example the hand 302 of the subject 304 or also several hands and / or several objects) are stored, for example in a memory of the control device 308.

In 202, according to one embodiment, a movement model (for example a hand movement model in the following example) is now learned, which high-level movement model can be viewed. According to one embodiment, a TP-HSMM (Task-Parameterized Hidden Semi Markov Model) is trained as a statistical model 309 for this purpose. A TP-HSMM enables both efficient learning and flexible reproduction for learning robot skills from human demonstrations. More specifically, the recorded trajectory of the user's hand 302 is treated as a desired movement to be learned while the trajectory of the object (304 is used to generate different task parameters) for the skill, which represent different configurations of the work area. It should be noted that when a user hand and an object are referred to below, the model can also be applied analogously to both hands of the user and / or several objects. The object or objects and the hand or hands (or other body parts such as an arm) that are involved in the activity are also summarized below as "elements that are involved in the activity". It should be noted that the demonstrations need not necessarily be performed by a human user. In special applications, it could also be desired that a robot learns an activity from an animal or also from a machine, for example another robot. In this sense, the demonstration instance (also referred to as a trainer or demonstration unit) that demonstrates an activity can also be an animal or another robot.

In other words, the model encapsulates how the user's hand 302 moves given various configurations of the workspace. For example, for the same skill in picking up and placing the object 304, the model learns different recording techniques (such as picking up the object 304 from above or from the side) for different orientations of the object 304.

The goal of training the mathematical model is to enable the robot to reproduce the demonstrated skill using its robotic arm. If there is a new scenario regarding the state of the robot arm and the position of the object 304, the controller 308 uses the model 309 to generate a reference trajectory and controls the robot in such a way that it follows the reference trajectory so that it does the work for it reproduced new scenario. The control algorithm according to which the control device 308 controls the robot arm end effector 104 in order to follow the reference trajectory is referred to as tracking control, for which many different realizations are possible depending on the particular dynamic behavior of the robot 101.

Due to the difference between the human movement behavior and the robot movement behavior, the case may arise that the reference trajectory generated by the controller 308 based on the model 309 is only partially feasible for a scenario, ie it is possible that the robot cannot follow it completely here . This case The controller 308 can control the robot 101 in such a way that the end effector 104 follows the determined reference trajectory as closely as physically possible.

The following describes the mathematical details on the basis of which the controller 308 can train the statistical model in 202 and control the robot 101 in 203 to perform an activity for which it has been trained.

In the following, a robot 101 with several degrees of freedom is assumed. The state of the end effector 104 is referred to as, where

the

Called the work manifold of the robot. For example, each point can be the manifold

the position of the end effector 104 in 3D in Cartesian

Coordinates, its orientation as a quaternion and, in the case of a gripper as a tool, the state of the gripper (e.g. how far the gripper is open or just whether it is open or closed).

It should be noted that an important property of this

The connection is that not only represents the state of the robot end effector 104 but also a state of the hand 304 of the user 301 in an off state

can be.

It is further assumed that the robot works in a static, known working environment and within this working environment the hand 302 of the user can be precisely tracked (ie cameras 304 are provided in sufficient number and arrangement). Within the range of the robot there are objects (of interest) marked with. Without limitation of

In general, it is assumed that the state p of each object in an object configuration manifold

For example, each point of the manifold can represent the possible position of a

Specify object 304 in 3D in Cartesian coordinates together with a possible orientation as a quaternion. Further, it is believed that there are a lot of core manipulation skills that enable the robot to manipulate (e.g. move) the objects. The

Set of these core manipulation skills is called the

designated.

For each activity (corresponding to a skill), the user 301 carries out several demonstrations that define how the robot 101 is to carry them out.

In particular, is for a skill

a lot of objects

Involved and the amount of demonstrations is denoted with, each demonstration with

is referred to, where

is a sequence of states s, where each state denotes the

(at the time t) desired state as he came from the tracking of the

Hand (by taking camera images), and specifying object states each in the manifold. By means of a

Combination of these skills, the robot 101 can manipulate respective objects so that they achieve a desired end state. In summary, several demonstrations of the user are followed for each skill to be learned (for example for the set of core skills), each demonstration being represented according to (1) (for example in a corresponding data structure in the control device 308). Each demonstration has the recorded Trajectory of the user's hand to the desired robot end-effector states

and the recorded states of all objects involved in the skill.

The number of states T _m results from the duration of the demonstration and the

Sampling rate, i.e. the rate at which the position of the hand (or hands) 301 and the object (or objects) 304 are determined. For example, if a demonstration lasts ten seconds and the sampling rate is set to 50Hz, then this demonstration consists of 500 training vectors. If a demonstration phase consists of five different demonstrations, then the final training data set consists of 2500 training vectors, assuming that each demonstration lasts ten seconds (with a fixed sampling rate of 50 Hz). The duration of the demonstration typically has no influence on the training process of the statistical model 308.

The desired trajectory of the robot end effector states when performing

the activity (for the same scenario) is given by the recorded (tracked) trajectory of the user's hand.

The control device 308 trains, for example, a TP-HSMM as a mathematical model 309. The trained TP-HSMM represents both spatial and temporal information with regard to how the robot arm should move with respect to the objects involved. By taking into account different perspectives (e.g. the perspectives of different objects), the controller can adapt the robot trajectory to new (i.e. non-demonstrated) scenarios in which the objects are in different places.

The idea on which learning from demonstrations is based can be seen in adapting a given skills model such as a GMM (Gaussian mixed model), which can be part of the TP-HSMM, to a given set of demonstrations. It is assumed that M demonstrations are given, each demonstration according to (1) being data points from a data set of

Contains total observations, with for

simplicity

Is accepted.

It is also assumed that each demonstration from the perspective P has different coordinate systems

which are referred to as task parameters (task parameters) is included.

One possibility of obtaining such task-parameterized (task-parameterized) data for a demonstration is to enter the end effector states observed for the demonstration, which are given in a global coordinate system, into each

Transform the coordinate system through

where the shift and rotation of the coordinate system

with respect to the global coordinate system at time t. It is believed that are available. For example, each is the local one

The coordinate system of an object and the object positions and object rotations are tracked.

One possibility for a statistical model 309 is a TP-GMM (task-parameterized Gaussian mixed model). It can be used as a tuple

where K denotes the number of Gaussian components in the mixed model, the a priori probability for the kth component

designated and

the mean value or the covariance of the k-th component in the coordinate system

are.

Such a mixed model (in contrast to simple GMMs) cannot be learned independently for each coordinate system. The reason is that the mixing coefficient

are shared by all coordinate systems (ie apply to all coordinate systems) and the k-th component in the coordinate system must be the k-th component in the

correspond to the global coordinate system. One way of learning (ie training) such models is the EM (expectation maximization) algorithm.

When a TP-GMM has been trained, it can be used by the control device 308 during execution (in 203 in the sequence of FIG. 2) to reproduce a trajectory for a learned skill. Namely, are the observed coordinate systems

given (which represent, for example, a new scenario by specifying the position and orientation for each object through the position and orientation of the respective local coordinate system), the control device 308 can use this to create a single combined GMM with the parameters

by multiplying the affine-transformed Gaussian components (of the different coordinate systems). The resulting combined GMM is given by

where the parameters of each Gaussian component in each local coordinate system (for the new scenario) are given by

More details can be found in the paper cited at the beginning. FIGS. 4A to 4E show representations which illustrate the combination of Gaussian components in local coordinate systems to form a combined Gaussian mixed model.

FIG. 4A corresponds to a first local coordinate system, for example the local coordinate system at the starting position 305 (which corresponds to the origin in FIG. 4A).

FIG. 4B corresponds to a second local coordinate system, for example the local coordinate system at the target position 306 (which corresponds to the origin in FIG. 4B).

4A clearly shows the movement of an object 304 to be moved from the point of view of the start position 305. The start position is always the same (from the point of view of the start position), so the course of the trajectories is well defined there (i.e. the scatter of the trajectories is small there). In other words, the local model is very confident there. Further away from the starting position, the model becomes less certain because it has seen various demonstrations in which the target position was in different places. The uncertainty is illustrated by the larger ellipses and the divergence of the trajectories. Each ellipse illustrates a Gaussian component around the mean of the Gaussian component and a size and shape corresponding to its covariance matrix.

At the target position it is analogous, the trajectories scatter more at a greater distance from the target position (because in different demonstrations the starting position was at different points relative to the target position). Closer to the target position, the trajectories scatter less (because in each demonstration the object 304 has finally arrived at the target position 305). For a new scenario, the model for the starting position, shown in FIG. 4C, is also very safe in the vicinity of the starting position, but unsafe at a greater distance from the starting position.

Similarly, the model for the target position, shown in FIG. 4D, is very safe in the vicinity of the starting position, but unsafe at a greater distance.

FIG. 4E now shows the combination of the two local models to form a combined model according to (3) to (6).

Compared to the two local models, the combined model is much safer in the middle, between the starting and target positions. The controller 308 can now (in 203) clearly control the robot 101 in such a way that it traverses the ellipses between the starting position and the target position (e.g. through their centers) in order to cope with the activity in the new scenario.

As mentioned, a TP-HSMM is used as statistical model 309 according to various embodiments.

An HSMM (Hidden Semi-Markov Model) extends a simple HMM (Hidden Markov Model) in such a way that temporal information is embedded in the underlying stochastic process. This means that while in the case of an HMM it is assumed that the underlying statistical process has the Markov property, ie the probability of moving to the next state depends only on the current state, in the case of an HSMM the process the probability of entering the transition to the next state, depends on the current state and on the dwell time in the current state. HSMMs are typically used in particular in speech synthesis.

A task-parameterized HSMM (TP-HSMM) is represented by according to one embodiment

in which

the transition probability from the state

in the state

designated,

the Gaussian distributions for the dwell time in the state

(the superscript D is not an index here, but is only intended to indicate the relationship with the length of stay) and

a (associated or associated) TP-GMM as described above.

The TP-GMM describes the output probabilities (or emission probabilities, i.e. probabilities for the observations) for each state k = 1, ... K.

In the case of an HSMM, however, the a priori probabilities only describe the

Probability distribution for the for the initial components at t = 1. The probabilities at later times are given by the semi-Markov model on which they are based.

In this TP -HSMM under consideration, each state corresponds to a Gaussian component in the associated TP-GMM.

It may be desirable that the structure of the TP -HSMM is linear, which means that the sequence of states is deterministic and only the dwell time is random. Such a linear structure can be given to accelerate the training by

are set with and

The TP -HSMM is (for example by the controller 308) in 202 based on the demonstrations

of the user 301 is trained, for example according to an EM (expectation maximization) procedure. The results of the training are values for the parameter set

that characterize the TP-HSMM.

In 203 the control device 308 can then control the robot 101 based on the trained model 309 in order to carry out an activity, for example for a new scenario. During execution, the controller uses the trained statistical model 309 to determine a reference trajectory for the new scenario and controls the robot to follow the reference trajectory. The term "scenario" refers to a special choice of the modeled task parameters (e.g. starting position 305 and target position 306).

For a desired starting state and a desired target state

to

a point in time

can be the most likely sequence of states

determined using the Viterbi algorithm for HSMMs

become. This sequence of states is the sequence of states which the robot 101 is to go through when reproducing the activity.

Such a state is, for example, the state represented by the (small) ellipse in FIG. 4E. It corresponds to a component of the combined TP-GMM for the new scenario (see description of FIG. 4). The TP -HSMMs clearly contains the information on how the states of the TP-GMMs are to be run through over time (given by the probability distributions for the length of stay in each state and the transition probabilities between the states). For each state k _t , the control device 308 determines a corresponding one

End effector state, for example, according to an LQG (Linear Quadratic Gaussian) rule. As

It can reference the mean value of the respective TP-GMM-

Take component. The control device ensures that the differences between successive ones are not too great (according to the selected controller parameters). At

the robot then tries to follow the execution, it is only an intermediate size.

Possibly due to the difference in the human movement behavior and the robot movement behavior, the robot can follow the sequence of states

do not follow exactly. This can also be due to the fact that

is a synthetic trajectory obtained from a trained model. For example, minimal intervention LQG control may be used by controller 308 to generate such a sequence of states of Gaussian components. Such control methods typically have both an optimization goal that is to avoid / punish a position error (here the distance between and the

Corresponding to the mean as well as a restriction to the possibilities of the

physical system (how consecutive can be chosen).

The latter in particular is important here in order to take into account the difference between demonstration instance and robot.

Generally can

can be viewed as the desired sequence of states and the control device 308 controls the robot 101 so that it follows this reference as well as physically possible. A non-Euclidean state space often occurs in manipulation tasks for a robot. For example, rotary dies are often used

or unit

Quaterions used to determine the orientation of objects or des

To describe robotic end effector. A straightforward approach that projects the solution of a Euclidean computation back onto a manifold suffers typically of poor accuracy and can sometimes lead to serious consequences due to a change in sign. Therefore, according to various embodiments, the described TP-HSMM formalism is adapted to Riemannian manifolds.

In summary, according to various exemplary embodiments, a method for controlling a robot device is provided, as is shown in FIG.

FIG. 5 shows a flow chart 500 which illustrates a method for controlling a robot device according to an embodiment.

In 501, an activity is performed by a demonstration entity.

In 502 a sequence of positions of a part of the presentation entity by means of which the presentation entity performs the activity is recorded.

In 503, training data is generated based on the recorded sequence of poses of the part of the demonstration instance.

In 504, a robot control model is trained by using the sequence of poses of the part of the demonstrator as the training sequence of poses for a part of the robot.

In 505, the robot device is controlled based on the trained robot control model.

In other words, according to various embodiments, an activity (corresponding to a skill) is demonstrated one or more times by a demonstration entity (eg a human user). The demonstration instance is not the robot device itself, but is, for example, independent of the robot device, that is to say, for example, a demonstration instance separate from the robot. The demonstrations are recorded and used as the basis for training a model to control the robotic device. This means that a model is training is that the robot device is enabled to carry out the activity (possibly also in other scenarios) itself (ie to acquire the skill), ie to reproduce the activity (possibly also in other scenarios) in other words. The part of the demonstration instance (eg a hand) is used as a model for the part of the robot device (eg an end effector). In other words, the model is trained in such a way that the part of the robot device imitates or copies (if necessary, adapted to a new scenario) the demonstrated behavior (for example the demonstrated movement sequences) of the part of the demonstration instance.

The sequence of positions is, for example, a sequence (or sequence) of positions and / or orientations and / or other states (e.g. "hand open" or "hand closed"). The sequence of positions can be viewed as a sequence of movements. The recording of the sequence of poses can be understood as recording information representing the sequence of poses (e.g. containing a representation of each position, e.g. in the form of a vector or a matrix).

The activity (e.g. picking up and placing an object, turning an object (e.g. a lever), opening an object, etc.) can be carried out in various ways, which depend in particular on the particular scenario in which the activity is carried out, e.g. starting position and start orientation of an object or the robot end effector and desired end position and desired end orientation of an object.

The sequence of positions of the part of the demonstration instance can be recorded based on sensor signals from various sensors, for example video sensors, radar sensors, LiDAR (light detection and ranging) sensors, ultrasonic sensors and motion sensors. For example, one or more sensors are used which enable the part of the presentation instance (for example a human hand) to be tracked precisely. For example, machine vision techniques can be used that make it possible to determine a position from RGB (red-green-blue) data or RGBD (red-green-blue depth) data. Alternatively (or in addition), a specific sensor system such as a motion capture system or a glove can be used for hand tracking in order to ensure precise tracking with regard to position, orientation, posture and / or status of the To determine or track part of the demonstration instance (eg a human hand) or an object of interest (eg an object manipulated during the activity).

Thus, according to various exemplary embodiments, one or more sensors are used to ultimately generate a control signal for controlling a physical system, such as a computer-controlled machine, a manufacturing machine, an (electrical) tool, a household appliance, a personal assistant, an access control device or a similar device, referred to herein as a “robot device”, which can be controlled in such a way that it is able to achieve movements similar to the demonstration entity, for example a human hand.

The training data for the robot control model are generated from the movement sequence (e.g. a demonstrated trajectory) demonstrated by the demonstration instance and, if necessary, trajectories of one or more objects that are manipulated during the activity. This enables a corresponding training method to be carried out for the robot control model. A trajectory can be understood in such a way that it has (at least) information about position or orientation or both.

The calculation of the control signal is ultimately based on a set of recorded demonstrations, for example on information about the hands and object positions followed during the demonstrations. A resulting control trajectory for the robot device can then follow the trajectory demonstrated by the demonstration instance, e.g. a human hand, e.g. so that it is very similar to it. For example, the control trajectory can be ascertained (or selected) in such a way that the probability that the demonstration instance will next show the ascertained control trajectory is maximized.

The mathematical basis of various embodiments can be seen in the fact that the movement of a tracked part of the demonstration instance (eg the position of a hand) is converted into a trajectory possible for the robot (eg the robot end effector). One difficulty here is typically that, for example, the movement behavior of a human hand is different from that of one Robotic end effector. However, a TP-HSMM allows both alternatives to be taken into account as task parameters during training in order to generate trajectories from the TP-HSMM representation. Although the invention has been shown and described primarily with reference to particular embodiments, it should be understood by those skilled in the art that numerous changes in design and details can be made therein without departing from the spirit and scope of the invention, as defined by the following claims. The scope of the invention is, therefore, determined by the appended claims, and it is intended that all changes which come within the literal meaning or range of equivalency of the claims be embraced.

Claims

1. A method of controlling a robotic device, comprising:

Carrying out an activity by a demonstration entity;

Recording a sequence of positions of a part of the display authority by means of which the display authority performs the activity;

Generating training data based on the recorded sequence of poses of the portion of the presentation instance;

Training a robot control model by using the sequence of poses of the part of the presenter as a training sequence of poses for a part of the robot; and

Controlling the robot device based on the trained robot control model, the recording of the sequence of positions comprising recording sensor data by one or more sensors and determining the sequence of positions from the sensor data. and wherein recording the sequence of poses comprises visually recording the performance of the activity by the presenter.

2. The method of claim 1, wherein the visual recording comprises capturing at least one sequence of camera images.

3. The method according to claim 2, wherein the positions are determined based on an image analysis of the sequence of camera images, in which the position and / or orientation of the part of the presentation instance is tracked in the at least one sequence of camera images.

4. The method according to any one of claims 1 to 3, wherein the controlling of the robot device based on the trained robot control model, the generation of a reference trajectory from the trained robot control model and the tracking of the reference trajectory by the robot device by means of a linear square Gaussian (LQG ) Controller.

5. The method according to any one of claims 1 to 4, comprising determining a sequence of positions of one or more further elements that are involved in the activity, the training data being generated based on the sequence of positions of the one or more further elements.

6. The method according to claim 5, wherein a sequence of postures of an object that is manipulated when performing the activity is determined and wherein the training data are generated based on the sequence of postures of the object.

7. The method according to any one of claims 1 to 6, comprising multiple execution of the activity by the demonstration instance;

Recording, for each performance of the activity by the demonstration authority, a sequence of positions of the part of the demonstration authority;

Generating training data based on the recorded sequences of poses of the part of the demonstration entity and training the robot control model by using the sequences of poses of the part of the demonstration entity as training sequences of poses for the part of the robot.

8. The method according to any one of claims 1 to 7, wherein the robot control model comprises a statistical model.

9. The method according to claim 8, wherein a sequence of positions of at least one object that is manipulated during the execution of the activity is determined and wherein the statistical model is parameterized by means of the determined positions.

10. The method according to any one of claims 1 to 9, wherein the statistical model has a task-parameterized Gaussian mixed model.

11. The method according to any one of claims 1 to 10, wherein the statistical model has a task-parameterized hidden semi-Markov model.

12. The method according to any one of claims 1 to 11, wherein there is no spatial contact between the demonstration entity and the robot device when performing the activity.

13. The method according to any one of claims 1 to 12, wherein the kinematics of the

Differentiate the demonstration entity and the robot device.

14. Robot control device set up to carry out a method according to one of claims 1 to 13.

15. Computer program, comprising program instructions which, when executed by one or more processors, cause the one or more processors to carry out a method according to one of claims 1 to 14.

16. Computer-readable storage medium on which program instructions are stored which, when executed by one or more processors, cause one or more processors to carry out a method according to any one of claims 1 to 13.