WO2019095108A1

WO2019095108A1 - Robot imitation learning method and apparatus, robot and storage medium

Info

Publication number: WO2019095108A1
Application number: PCT/CN2017/110923
Authority: WO
Inventors: 欧勇盛; 王志扬; 段江哗; 金少堃; 徐升; 熊荣; 吴新宇
Original assignee: 深圳先进技术研究院
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2019-05-23

Abstract

A robot imitation learning method and apparatus, a robot and a storage medium. Said method comprises: when a motion instruction is received, acquiring the position and posture of an end executor at a current moment, and detecting whether the position and posture of the current moment is a target position and posture; if so, determining that the end executor completes a preset imitation learning task; otherwise, generating, according to the position and posture and a dynamic prediction model, a predicted position and posture of the end executor at a next moment, adjusting the angle of each joint according to the predicted position and posture, setting the position and posture which has been adjusted by the end executor as the position and posture of the current moment, and jumping to the step of detecting whether the position and posture at the current moment is the target position and posture, the dynamic prediction model being obtained by training of an extreme learning machine model in combination with a preset stability constraint condition, ensuring the stability, reproduction accuracy and model training speed of the robot imitation learning, effectively improving the degree of personalization of the robot motion.

Description

Robot simulation method, device, robot and storage medium

Technical field

The invention belongs to the technical field of robots and intelligent control, and in particular relates to a simulation learning method, device, robot and storage medium for a robot.

Background technique

In the current robot application, especially in the industrial application of the robot, the user usually pre-defines the movement trajectory of the robot arm, or presets a certain task environment, so that the robot arm can be repeatedly executed according to the plan. In this control mode, the robotic arm can't face the change of the task environment or the sudden disturbance, or it requires more manual programming to realize the task in the complex scene or the more difficult task. More importantly, the movement of the robot arm There is no hidden human operating habits. Robotic imitation learning is an important way to solve these problems.

When modeling robot motion through simulation learning, researchers usually hope to achieve the following three goals: First, we hope that the robot can always move to the desired target. From the control point of view, we hope that the system has certain stability. Sexuality, that is, when the robot encounters some time or space disturbance during the movement and deviates from the trajectory, it can still converge to the target accurately. Second, it is hoped that the trajectory of the robot can be as much as possible with the previous human teaching during the movement. The trajectory has a similar contour, that is, the "precision" of the robot's recurrence. Third, it is desirable to minimize the time required for the machine learning method to train the model parameters, that is, to increase the "speed" of the model training.

“Stability”, “precision” and “speed” are usually the relationship of mutual restraint and contradiction. The best trade-off between accuracy, speed and stability is the key to robot simulation learning. At present, the most famous robot simulation learning method in the world is to model the motion of the robot by establishing a "dynamic system". The "dynamic system" is initially modeled by the Gaussian mixture model, and the stability is considered. However, due to the complexity of model training, it is impossible to effectively balance "stability", "precision" and "speed". Domestic robot simulation learning methods are mostly based on Gaussian mixture model and Gaussian process, and stability problems are not considered. It is also impossible to effectively balance "stability", "precision" and "speed".

Summary of the invention

An object of the present invention is to provide an imitation learning method, apparatus, robot and storage medium for a robot, which aims to solve the problem that the stability, recurrence accuracy and model training speed of the robot imitative learning cannot be guaranteed at the same time in the prior art.

In one aspect, the present invention provides an imitation learning method for a robot, the method comprising the steps of:

Obtaining the pose of the current moment of the end effector when receiving the preset motion instruction;

Detecting whether the pose of the current moment is a preset target pose, and determining that the end effector completes the preset imitation learning task, otherwise, according to the pose of the current moment and the pre-trained dynamic Predicting a model, generating a predicted pose of the end effector at a next moment, the dynamic predictive model being trained by a pre-built extreme learning machine model in combination with a preset stability constraint;

Adjusting a joint angle of each joint according to the predicted pose of the next moment, and obtaining a posture after the end effector is adjusted;

And setting the adjusted pose to the pose of the current moment, and jumping to a step of detecting whether the pose of the current moment is a preset target pose.

In another aspect, the present invention provides an imitation learning device for a robot, the device comprising:

a posture acquiring unit, configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction;

a posture determining unit, configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task, otherwise, according to the current moment a pre-trained dynamic prediction model that generates a predicted bit at the next moment of the end effector Position, the dynamic prediction model is trained by a pre-built extreme learning machine model in combination with preset stability constraints;

a motion adjustment unit, configured to adjust a joint angle of each joint according to the predicted pose of the next moment, and obtain a posture after the end effector is adjusted;

a posture setting unit, configured to set the adjusted pose as a pose of the current moment, and perform, by the pose determination unit, to generate a target position for detecting whether the pose of the current moment is a preset target The operation of the posture.

In another aspect, the present invention also provides a robot comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor executing the computer program as The steps described above are mimicking the learning method of the robot.

In another aspect, the present invention also provides a computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the steps as described above in the mimic learning method of the robot .

The invention constructs the extreme learning machine model in advance, and derives the stability constraint condition of the extreme learning machine model, and the dynamic learning model is obtained by the extreme learning machine model combined with the stability constraint condition training, and the end effector is detected when receiving the motion instruction. Whether the pose of the current moment is the target pose, it is determined that the end effector completes the imitation learning task, otherwise, according to the pose and dynamic prediction model of the current moment of the end effector, the predicted pose of the end effector is generated at the next moment, according to The predicted pose adjusts the joint of the end effector and jumps to the step of detecting whether the pose of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy, and the model training speed. , effectively improve the humanity of robot movement.

DRAWINGS

1 is a flowchart of an implementation of an imitation learning method of a robot according to Embodiment 1 of the present invention;

2 is a flowchart of an implementation of collecting a data sample set and a training dynamic prediction model in a simulated learning method of a robot according to Embodiment 2 of the present invention;

3 is a schematic structural diagram of an imitation learning device of a robot according to Embodiment 3 of the present invention;

4 is a schematic structural diagram of an imitation learning device of a robot according to Embodiment 4 of the present invention;

FIG. 5 is a schematic structural diagram of a robot according to Embodiment 5 of the present invention.

Detailed ways

The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The specific implementation of the present invention is described in detail below in conjunction with specific embodiments:

Embodiment 1:

FIG. 1 is a flowchart showing an implementation process of an imitation learning method for a robot according to Embodiment 1 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, which are described in detail as follows:

In step S101, when a preset motion instruction is received, the pose of the current moment of the end effector is acquired.

The embodiments of the present invention are applicable to, but not limited to, a robot having a structure such as a joint or a connecting rod, which can realize an action of stretching, grasping, and the like. Upon receiving the motion or movement command sent by the user or the control system, the robot can acquire the joint angle of each joint, and then calculate the pose of the end effector according to the joint angle and the positive kinematics, and if the robot itself takes A position sensor having an end effector by which the position of the end effector at the current moment is directly obtained, wherein the pose includes the position and orientation of the end effector.

In step S102, it is detected whether the pose of the current moment is a preset target pose.

In the embodiment of the present invention, it is detected whether the pose of the current moment of the end effector is a preset target pose, and when the pose of the current moment of the end effector is the target pose, step S106 is performed, otherwise step S103 is performed.

In step S103, based on the pose of the current time and the pre-trained dynamic prediction model, the predicted pose of the end effector at the next moment is generated, and the dynamic predictive model is constructed by a pre-built extreme learning model. The model is trained in conjunction with preset stability constraints.

In the embodiment of the present invention, when the pose of the end effector at the current moment is not the preset target pose, the pose of the end effector needs to be adjusted. The pre-trained dynamic prediction model is used to predict the change of the current state of the end effector according to the current state of the end effector. Therefore, after inputting the pose of the current moment of the end effector into the dynamic prediction model, the dynamic prediction model output can be obtained. The speed at which the end effector is currently moving. According to the pose and motion speed of the end effector at the current moment, the predicted pose of the end effector at the next moment can be calculated. The calculation formula can be expressed as:

Where x _t+1 is the predicted pose of the next moment t+1 of the end effector, and x _t is the predicted pose of the current moment t of the end effector,

For the output of the dynamic prediction model, δt is the preset sampling time interval.

In the embodiment of the present invention, the extreme learning machine model is constructed in advance in the training process of the dynamic prediction model, and the stability constraint condition corresponding to the extreme learning machine model is constructed according to the Lyapunov theorem, and the stability constraint condition is combined with the stability learning condition. The machine model is supervised and trained. The trained extreme learning machine model is the trained dynamic prediction model, which combines the stability constraints of the extreme learning machine and the Lyapunov theorem to effectively guarantee the robot simulation learning. Stability, reproducibility and model training speed.

The training samples for the training of the extreme learning machine model are collected in the teaching process of the user. The training process of the training sample and the training process of the dynamic prediction model can be referred to the detailed description of each step in the second embodiment, and details are not described herein.

In step S104, the joint angle of each joint is adjusted based on the predicted pose of the next moment, and the pose after adjustment by the end effector is acquired.

In the embodiment of the present invention, after obtaining the predicted pose of the end effector at the next moment, the end effector can be moved from the current pose to the predicted pose by the inverse kinematics calculation, and the joints of the current robot need to change respectively. The angle, and thus the joint angle of each joint of the robot, due to the error and precision in the adjustment process, the position and the predicted pose of the end effector are different, and can be calculated by the forward kinematics according to the angle of each joint after the robot adjusts. Get the position of the end effector adjusted.

In step S105, the adjusted pose is set as the pose of the current time.

In the embodiment of the present invention, the position adjusted by the end effector is set to the posture of the current moment of the end effector, and jumps to whether the posture of detecting the current moment of the end effector is the preset target position in step S102. The posture operation, so looping, until the end effect of the end effector is the same as the preset target pose.

In step S106, it is determined that the end effector completes the preset imitation learning task.

In the embodiment of the present invention, when the positional posture adjusted by the end effector is the target pose, the end effector can be considered to successfully imitate the motion characteristics of the human, converge to the target point, and determine that the end effector completes the preset imitation learning. task.

In the embodiment of the present invention, when the pose of the current moment of the end effector is not the target pose, the pose of the current moment is input into the dynamic prediction model, and the predicted pose of the next moment of the end effector is obtained, according to the predicted pose. Adjust the angle of each joint, obtain the posture of the end effector adjustment, continue to determine whether the end position of the end effector is the target pose, and so on, until the end effector poses the target pose, thus according to the limit The combination of the learning machine model and the stability constraints based on Lyapunov's theorem ensures the stability of the robot's simulation learning, the reproducibility and the speed of the model training, and effectively improves the humanization of the robot movement.

Embodiment 2:

FIG. 2 is a flowchart showing an implementation process of collecting a training sample set and a training dynamic prediction model in a simulated learning method of a robot according to Embodiment 2 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown. as follows:

In step S201, the pose of the end effector is acquired on each teaching track of the end effector according to a preset sampling time interval during teaching.

In the embodiment of the present invention, the teaching action may be given by the teacher or the user during the teaching process, and the end effector moves according to the teaching action, and the robot itself or the external motion capture device according to the preset sampling time interval. The pose of the end effector is acquired on each motion track (teaching track), and the position of the collected end effector can be expressed as

Where i=1,...,N _traj ,k=1,...,N ⁱ ,N _traj is the number of teaching tracks, and N ⁱ is the number of sampling points on the i-th teaching track.

In the embodiment of the present invention, the teaching mode in the teaching process is not limited. As an example, the teacher can manipulate the robot through the remote controller or the teach pendant to give a teaching action, or by grasping The end effector moves a trajectory in the plane or space to give a teaching action, and can also collect the teaching action by the data glove by wearing the data glove to complete the motion task.

In step S202, according to the sampling time interval and the pose at each sampling point of the end effector, the speed at each sampling point of the end effector is calculated, and the pose and speed combination at each sampling point of the end effector are combined. Train the training samples of the sample set.

In an embodiment of the invention, after sampling the pose at each sample point of the end effector, the velocity at each sample point of the end effector can be calculated, as an example, the speed at each sample point of the end effector The calculation formula can be expressed as:

Where δt is the preset sampling interval,

with

The velocity at the kth sample point on the ith trace of the end effector. After that, the pose and velocity at each sampling point of the end effector are combined to form a training sample of the training sample set, and the training sample can be expressed as

In step S203, an extreme learning machine model is constructed, and the input and target output of the extreme learning machine model are initialized according to the training sample set acquired during the preset teaching process.

In the embodiment of the present invention, the extreme learning machine model is a special forward neural network model, which is special in that it only contains one hidden layer, and the number, weight and offset of neurons in the hidden layer are randomly determined. In the process of training the extreme learning machine model, the weight and offset of the hidden layer remain unchanged, and only the weight of the output layer is modified. Therefore, the extreme learning machine model is adopted as the dynamic prediction model of the robot simulation learning. In the case of large-scale training data, good training results can be obtained, and stability constraints can be added to the extreme learning machine model.

In the embodiment of the present invention, an extreme learning machine model is constructed, and the extreme learning machine model can be expressed as:

among them,

with

For the number of neurons, offsets, and weights of hidden layers in the ultimate learning model, β = (β ₁ ,..., β _i ,...,β _N ) is the weight of the output layer in the network model of the extreme learning machine, x, g (x) is the input and activation function of the extreme learning machine model respectively. The activation function can be a sigmoid function (sigmoid function) or a hyperbolic tangent function (tanh function), and the activation function is not limited here.

In addition, the input layer and the output layer of the extreme learning machine model should have the same dimension, that is, have the same number of neurons d, if the end effector moves in a two-dimensional plane, d=2, if the end effector is in three-dimensional space Exercise, d=3.

In the embodiment of the present invention, the pose of the end effector in the training sample set training sample is set as the input of the extreme learning machine model, and the speed of the end effector in the training sample is set as the target output of the extreme learning machine model, thereby The optimization goal of obtaining the extreme learning machine model is:

among them,

O is the speed of the end effector in the training sample and also the target output of the extreme learning machine model.

In step S204, a stability constraint condition is constructed according to the preset Lyapunov theorem, and the stability constraint condition includes a global asymptotically stable constraint condition and a local asymptotically stable constraint condition.

In the embodiment of the present invention, a stability constraint condition suitable for the extreme learning machine model is derived based on the Lyapunov theorem, and the stability constraint condition is restricted by the condition in the extreme learning machine model, so that the ultimate learning of the training is obtained. The machine model ensures that the robot mimics the stability of learning. Stability constraints include global asymptotically stable constraints and locally asymptotically stable constraints. Globally asymptotically stable constraints can be expressed as:

for

And Φ _i

There are d eigenvalues linearly independent of eigenvalues, where Φ _i is

"symmetric part",

< is the negative of the matrix. Locally asymptotically stable constraints can be expressed as:

for

In step S205, the extreme learning machine model is supervised according to stability constraints. Training, setting the trained extreme learning machine model as a dynamic prediction model.

In the embodiment of the invention, the optimization goal of the extreme learning machine model

Optimization is performed to obtain a set of output layer weights β that satisfy the stability constraints and optimize the optimization goals. As an example, the optimization target can be optimized by least squares

Solve and get

Stability constraint

Constraints are made, where H ⁺ is the Moore-Penrose generalized inverse matrix of matrix H. Finally, the trained extreme learning machine model is the trained dynamic prediction model.

In the embodiment of the present invention, an extreme learning machine model is constructed, and a stability constraint condition suitable for the extreme learning machine model is derived based on the Lyapunov theorem. According to the training sample set and stability constraint conditions collected during the teaching process, the limit is The learning machine model is trained, and the trained extreme learning machine model is the trained dynamic prediction model, which effectively improves the model training speed of the robot simulation learning, and at the same time ensures the stability and recurrence precision of the robot simulation learning.

Embodiment 3:

FIG. 3 is a diagram showing the structure of an imitation learning device for a robot according to Embodiment 3 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, including:

The posture obtaining unit 31 is configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction.

In the embodiment of the present invention, when receiving the motion or movement instruction sent by the user or the control system, the robot can acquire the joint angle of each joint, and then calculate the posture of the current moment of the end effector according to the joint angle and the positive kinematics. In addition, if the robot itself has a position sensor of the end effector, the position sensor can directly obtain the position of the end effector at the current moment.

The posture determining unit 32 is configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task; otherwise, according to the current posture and pre-training The dynamic prediction model generates the predicted pose of the end effector at the next moment. The dynamic predictive model is trained by the pre-built extreme learning machine model combined with the preset stability constraints.

In the embodiment of the present invention, it is detected whether the pose of the current moment is a preset target pose, and yes, it can be considered that the end effector successfully imitates the motion characteristics of the human, converges to the target point, and determines the end effector. The preset imitation learning task is completed, otherwise the posture of the end effector needs to be adjusted until the end effector's pose is the target pose. The pre-trained dynamic prediction model is used to predict the change of the current state of the end effector according to the current state of the end effector. Therefore, after inputting the pose of the current moment of the end effector into the dynamic prediction model, the dynamic prediction model output can be obtained. The speed at which the end effector is currently moving. According to the pose and motion speed of the end effector at the current moment, the predicted pose of the end effector at the next moment can be calculated. The calculation formula can be expressed as:

The training samples for the training of the extreme learning machine model are collected in the teaching process of the user. The training process of the training sample and the training process of the dynamic prediction model can be referred to the detailed description of the corresponding unit in the fourth embodiment, and details are not described herein.

The motion adjusting unit 33 is configured to adjust the joint angle of each joint according to the predicted posture of the next moment, and obtain the posture after the end effector adjustment.

a posture setting unit 34, configured to set the adjusted pose as the pose of the current moment, and to be posed The judging unit 32 performs an operation of detecting whether the pose of the current time is the preset target pose.

In the embodiment of the present invention, the position adjusted by the end effector is set to the posture of the end effector at the current time, and the posture determining unit 32 performs a detection of whether the posture of the current moment of the end effector is the preset target position. The posture operation, so looping, until the end effect of the end effector is the same as the preset target pose.

Embodiment 4:

FIG. 4 is a diagram showing the structure of an imitation learning device for a robot according to Embodiment 4 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, including:

The teaching acquisition unit 41 is configured to collect the pose of the end effector on each teaching track of the end effector according to a preset sampling time interval during teaching.

Where i=1,...,N _traj ,k=1,...,N ⁱ ,N _traj is the number of teaching tracks, and N ⁱ is the number of sampling points on the i-th teaching track. There is no restriction on the teaching method in the teaching process.

The sample generating unit 42 is configured to calculate the velocity at each sampling point of the end effector according to the sampling time interval and the pose at each sampling point of the end effector, and the pose and speed at each sampling point of the end effector Combine the training samples that make up the training sample set.

Where δt is the preset sampling interval,

with

The model construction unit 43 is configured to construct an extreme learning machine model, and initialize an input and a target output of the extreme learning machine model according to the training sample set collected in the preset teaching process.

among them,

with

For the number of neurons, offsets and weights of the hidden layer in the extreme learning machine model, β=(β ₁ ,...,β _i ,...,β _N ) is the weight of the output layer in the network model of the extreme learning machine, x, g (x) The input and activation functions of the extreme learning machine model, respectively, where the activation function is not limited.

In addition, the input layer and the output layer of the extreme learning machine model should have the same dimension, that is, have the same number of neurons d of 15, if the end effector moves in a two-dimensional plane, d=2, if the end effector is in three dimensions Space motion, d=3.

among them,

The constraint construction unit 44 is configured to construct a stability constraint according to the preset Lyapunov theorem, and the stability constraint includes a global asymptotically stable constraint condition and a local asymptotically stable constraint condition.

In the embodiment of the present invention, based on the Lyapunov theorem, the model suitable for the extreme learning machine model is derived. Stability constraints, stability constraints By conditionally constraining the weights in the extreme learning machine model, the trained extreme learning machine model can guarantee the stability of robot simulation learning. Stability constraints include global asymptotically stable constraints and locally asymptotically stable constraints. Globally asymptotically stable constraints can be expressed as:

for

And Φ _i

There are d eigenvalues linearly independent of eigenvalues, where Φ _i is

"symmetric part",

for

The model training unit 45 is configured to perform supervised training on the extreme learning machine model according to the stability constraint condition, and set the trained extreme learning machine model as a dynamic prediction model.

Solve and get

Stability constraint

The posture obtaining unit 46 is configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction.

The posture determining unit 47 is configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task; otherwise, according to the current posture and pre-training Dynamic prediction model, generating the predicted pose of the end effector at the next moment, dynamic prediction model It is trained by a pre-built extreme learning machine model in combination with preset stability constraints.

In the embodiment of the present invention, it is determined whether the pose of the current moment is a preset target pose, and yes, it can be considered that the end effector successfully imitates the motion characteristics of the human, converges to the target point, and determines that the end effector completes the preset. Imitate the learning task, otherwise the orientation of the end effector needs to be adjusted until the end effector's pose is the target pose.

In the embodiment of the present invention, when the pose of the current moment of the end effector is not the target pose, the pose of the current moment of the end effector is input into the dynamic prediction model, and the motion of the current moment of the end effector output by the dynamic predictive model is obtained. speed. According to the pose and motion speed of the end effector at the current moment, the predicted pose of the end effector at the next moment can be calculated. The calculation formula can be expressed as:

The motion adjusting unit 48 is configured to adjust the joint angle of each joint according to the predicted posture of the next moment, and obtain the posture after the end effector adjustment.

The pose setting unit 49 is configured to set the adjusted pose as the pose of the current moment, and the pose determination unit 47 performs an operation of detecting whether the pose of the current moment is the preset target pose.

In the embodiment of the present invention, the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem. When the pose of the end effector is obtained at the current moment, the end is dynamically preset by the model. The posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy and the model training speed, effectively improving the humanization of the robot movement. degree.

In the embodiment of the present invention, each unit of the mimicry learning device of the robot may be implemented by a corresponding hardware or software unit, and each unit may be an independent software and hardware unit, or may be integrated into one soft and hardware unit, and there is no need to limit this invention.

Embodiment 5:

Fig. 5 shows the structure of a robot provided in Embodiment 5 of the present invention, and for convenience of explanation, only parts related to the embodiment of the present invention are shown.

The robot 5 of the embodiment of the present invention includes a processor 50, a memory 51, and a computer program 52 stored in the memory 51 and operable on the processor 50. The processor 50, when executing the computer program 52, implements the steps in the various method embodiments described above, such as steps S101 through S106 shown in FIG. Alternatively, processor 50, when executing computer program 52, implements the functions of the various units of the various apparatus embodiments described above, such as the functions of units 31 through 34 shown in FIG.

Example 6:

In an embodiment of the present invention, there is provided a computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the steps in the foregoing method embodiments, for example, FIG. Steps S101 to S106 are shown. Alternatively, the computer program, when executed by the processor, implements the functions of the various units in the various apparatus embodiments described above, such as the functions of units 31 through 34 shown in FIG.

In the embodiment of the present invention, the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem. When the pose of the end effector is obtained at the current moment, the end is dynamically preset by the model. The posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability, reproducibility and mode of the robot simulation learning. The training speed effectively improves the humanization of robot movement.

The computer readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program code, a recording medium such as a ROM/RAM, a magnetic disk, an optical disk, a flash memory, or the like.

The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. Within the scope.

Claims

A method for simulating learning of a robot, characterized in that the method comprises the following steps:

Obtaining the pose of the current moment of the end effector when receiving the preset motion instruction;

Detecting whether the pose of the current moment is a preset target pose, and determining that the end effector completes the preset imitation learning task, otherwise, according to the pose of the current moment and the pre-trained dynamic Predicting a model, generating a predicted pose of the end effector at a next moment, the dynamic predictive model being trained by a pre-built extreme learning machine model in combination with a preset stability constraint;

Adjusting a joint angle of each joint according to the predicted pose of the next moment, and obtaining a posture after the end effector is adjusted;

And setting the adjusted pose to the pose of the current moment, and jumping to a step of detecting whether the pose of the current moment is a preset target pose.
The method of claim 1, wherein the method further comprises: before the step of acquiring the current position of the end effector, when the preset motion command is received, the method further comprising:

Constructing the extreme learning machine model, and initializing the input and target output of the extreme learning machine model according to the training sample set collected in the preset teaching process;

The stability constraint is constructed according to a preset Lyapunov theorem, and the stability constraint includes a global asymptotically stable constraint and a locally asymptotically stable constraint;

According to the stability constraint, the extreme learning machine model is supervised and the trained extreme learning machine model is set as the dynamic prediction model.
The method of claim 2, wherein before the step of constructing the extreme learning machine model, the method further comprises:

Collecting a pose of the end effector on each teaching track of the end effector according to a preset sampling time interval during the teaching process;

Calculating a velocity at each sampling point of the end effector according to the sampling time interval and a pose at each sampling point of the end effector, and placing a pose at each sampling point of the end effector, The speed combination constitutes a training sample of the training sample set.
The method of claim 3, wherein the extreme learning machine model is constructed, and the input and target output of the extreme learning machine model are initialized according to a training sample set acquired during a preset teaching process Steps, including:

Constructing the extreme learning machine model, the extreme learning machine model expressed as:

Wherein said
with
Respectively, the number, offset, and weight of neurons in the hidden layer in the extreme learning machine model, wherein β=(β 1 , . . . , β N ) is the weight of the output layer in the network model of the extreme learning machine. The x and g(x) are respectively the input and activation functions of the extreme learning machine model;

Setting a pose of the end effector and a speed of the end effector in a training sample of the training sample set to an input and a target output of the extreme learning machine model, respectively, to obtain the extreme learning machine model Optimization goal, expressed as:

Wherein said
The O is the velocity of the end effector in the training sample of the training sample set, and is also the target output of the extreme learning machine model.
The method of claim 2, wherein the step of constructing the stability constraint according to a preset Lyapunov theorem comprises:

Constructing the global asymptotically stable constraint according to the Lyapunov theorem, the global asymptotically stable constraint condition is:

for
And Φ i
There are d eigenvalues linearly irrelevant among the eigenvalues, wherein

Constructing the locally asymptotically stable constraint according to the Lyapunov theorem, the local asymptotically stable constraint condition is:

for
An imitation learning device for a robot, characterized in that the device comprises:

a posture acquiring unit, configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction;

a posture determining unit, configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task, otherwise, according to the current moment And a pre-trained dynamic prediction model, generating a predicted pose of the end effector at a next moment, the dynamic predictive model being trained by a pre-built extreme learning machine model in combination with a preset stability constraint;

a motion adjustment unit, configured to adjust a joint angle of each joint according to the predicted pose of the next moment, and obtain a posture after the end effector is adjusted;

a posture setting unit, configured to set the adjusted pose as a pose of the current moment, and perform, by the pose determination unit, whether the pose of the current moment is a preset target pose Operation.
The device of claim 6 wherein said device further comprises:

a model building unit, configured to construct the extreme learning machine model, and initialize an input and a target output of the extreme learning machine model according to a training sample set collected in a preset teaching process;

a constraint building unit, configured to construct the stability constraint according to a preset Lyapunov theorem, the stability constraint including a global asymptotically stable constraint and a locally asymptotically stable constraint;

And a model training unit, configured to perform supervised training on the extreme learning machine model according to the stability constraint condition, and set the trained extreme learning machine model as the dynamic prediction model.
The device of claim 7 wherein said device further comprises:

a teaching acquisition unit, configured to acquire a pose of the end effector on each teaching track of the end effector according to a preset sampling time interval during the teaching; and

a sample generating unit, configured to calculate a velocity at each sampling point of the end effector according to the sampling time interval and a pose at each sampling point of the end effector, and sample the end effector each sampling The pose and velocity combination at the point constitutes a training sample of the training sample set.
A robot comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor executes the computer program as claimed in claims 1 to 5 The steps of any of the methods described.
A computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 5.