WO2019095108A1 - Robot imitation learning method and apparatus, robot and storage medium - Google Patents
Robot imitation learning method and apparatus, robot and storage medium Download PDFInfo
- Publication number
- WO2019095108A1 WO2019095108A1 PCT/CN2017/110923 CN2017110923W WO2019095108A1 WO 2019095108 A1 WO2019095108 A1 WO 2019095108A1 CN 2017110923 W CN2017110923 W CN 2017110923W WO 2019095108 A1 WO2019095108 A1 WO 2019095108A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pose
- end effector
- learning machine
- preset
- extreme learning
- Prior art date
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
Definitions
- the invention belongs to the technical field of robots and intelligent control, and in particular relates to a simulation learning method, device, robot and storage medium for a robot.
- the user usually pre-defines the movement trajectory of the robot arm, or presets a certain task environment, so that the robot arm can be repeatedly executed according to the plan.
- the robotic arm can't face the change of the task environment or the sudden disturbance, or it requires more manual programming to realize the task in the complex scene or the more difficult task. More importantly, the movement of the robot arm There is no hidden human operating habits. Robotic imitation learning is an important way to solve these problems.
- An object of the present invention is to provide an imitation learning method, apparatus, robot and storage medium for a robot, which aims to solve the problem that the stability, recurrence accuracy and model training speed of the robot imitative learning cannot be guaranteed at the same time in the prior art.
- the present invention provides an imitation learning method for a robot, the method comprising the steps of:
- the present invention provides an imitation learning device for a robot, the device comprising:
- a posture acquiring unit configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction
- a posture determining unit configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task, otherwise, according to the current moment a pre-trained dynamic prediction model that generates a predicted bit at the next moment of the end effector Position, the dynamic prediction model is trained by a pre-built extreme learning machine model in combination with preset stability constraints;
- a motion adjustment unit configured to adjust a joint angle of each joint according to the predicted pose of the next moment, and obtain a posture after the end effector is adjusted
- a posture setting unit configured to set the adjusted pose as a pose of the current moment, and perform, by the pose determination unit, to generate a target position for detecting whether the pose of the current moment is a preset target The operation of the posture.
- the present invention also provides a robot comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor executing the computer program as The steps described above are mimicking the learning method of the robot.
- the present invention also provides a computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the steps as described above in the mimic learning method of the robot .
- the invention constructs the extreme learning machine model in advance, and derives the stability constraint condition of the extreme learning machine model, and the dynamic learning model is obtained by the extreme learning machine model combined with the stability constraint condition training, and the end effector is detected when receiving the motion instruction.
- the pose of the current moment is the target pose, it is determined that the end effector completes the imitation learning task, otherwise, according to the pose and dynamic prediction model of the current moment of the end effector, the predicted pose of the end effector is generated at the next moment, according to The predicted pose adjusts the joint of the end effector and jumps to the step of detecting whether the pose of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy, and the model training speed. , effectively improve the humanity of robot movement.
- Embodiment 1 is a flowchart of an implementation of an imitation learning method of a robot according to Embodiment 1 of the present invention
- FIG. 2 is a flowchart of an implementation of collecting a data sample set and a training dynamic prediction model in a simulated learning method of a robot according to Embodiment 2 of the present invention
- FIG. 3 is a schematic structural diagram of an imitation learning device of a robot according to Embodiment 3 of the present invention.
- FIG. 4 is a schematic structural diagram of an imitation learning device of a robot according to Embodiment 4 of the present invention.
- FIG. 5 is a schematic structural diagram of a robot according to Embodiment 5 of the present invention.
- Embodiment 1 is a diagrammatic representation of Embodiment 1:
- FIG. 1 is a flowchart showing an implementation process of an imitation learning method for a robot according to Embodiment 1 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, which are described in detail as follows:
- step S101 when a preset motion instruction is received, the pose of the current moment of the end effector is acquired.
- the embodiments of the present invention are applicable to, but not limited to, a robot having a structure such as a joint or a connecting rod, which can realize an action of stretching, grasping, and the like.
- the robot can acquire the joint angle of each joint, and then calculate the pose of the end effector according to the joint angle and the positive kinematics, and if the robot itself takes A position sensor having an end effector by which the position of the end effector at the current moment is directly obtained, wherein the pose includes the position and orientation of the end effector.
- step S102 it is detected whether the pose of the current moment is a preset target pose.
- step S106 it is detected whether the pose of the current moment of the end effector is a preset target pose, and when the pose of the current moment of the end effector is the target pose, step S106 is performed, otherwise step S103 is performed.
- step S103 based on the pose of the current time and the pre-trained dynamic prediction model, the predicted pose of the end effector at the next moment is generated, and the dynamic predictive model is constructed by a pre-built extreme learning model.
- the model is trained in conjunction with preset stability constraints.
- the pose of the end effector at the current moment when the pose of the end effector at the current moment is not the preset target pose, the pose of the end effector needs to be adjusted.
- the pre-trained dynamic prediction model is used to predict the change of the current state of the end effector according to the current state of the end effector. Therefore, after inputting the pose of the current moment of the end effector into the dynamic prediction model, the dynamic prediction model output can be obtained.
- the speed at which the end effector is currently moving. According to the pose and motion speed of the end effector at the current moment, the predicted pose of the end effector at the next moment can be calculated.
- the calculation formula can be expressed as:
- ⁇ t is the preset sampling time interval.
- the extreme learning machine model is constructed in advance in the training process of the dynamic prediction model, and the stability constraint condition corresponding to the extreme learning machine model is constructed according to the Lyapunov theorem, and the stability constraint condition is combined with the stability learning condition.
- the machine model is supervised and trained.
- the trained extreme learning machine model is the trained dynamic prediction model, which combines the stability constraints of the extreme learning machine and the Lyapunov theorem to effectively guarantee the robot simulation learning. Stability, reproducibility and model training speed.
- the training samples for the training of the extreme learning machine model are collected in the teaching process of the user.
- the training process of the training sample and the training process of the dynamic prediction model can be referred to the detailed description of each step in the second embodiment, and details are not described herein.
- step S104 the joint angle of each joint is adjusted based on the predicted pose of the next moment, and the pose after adjustment by the end effector is acquired.
- the end effector after obtaining the predicted pose of the end effector at the next moment, the end effector can be moved from the current pose to the predicted pose by the inverse kinematics calculation, and the joints of the current robot need to change respectively.
- the angle, and thus the joint angle of each joint of the robot, due to the error and precision in the adjustment process, the position and the predicted pose of the end effector are different, and can be calculated by the forward kinematics according to the angle of each joint after the robot adjusts. Get the position of the end effector adjusted.
- step S105 the adjusted pose is set as the pose of the current time.
- the position adjusted by the end effector is set to the posture of the current moment of the end effector, and jumps to whether the posture of detecting the current moment of the end effector is the preset target position in step S102.
- the posture operation so looping, until the end effect of the end effector is the same as the preset target pose.
- step S106 it is determined that the end effector completes the preset imitation learning task.
- the end effector when the positional posture adjusted by the end effector is the target pose, the end effector can be considered to successfully imitate the motion characteristics of the human, converge to the target point, and determine that the end effector completes the preset imitation learning. task.
- the pose of the current moment of the end effector is not the target pose
- the pose of the current moment is input into the dynamic prediction model, and the predicted pose of the next moment of the end effector is obtained, according to the predicted pose.
- Adjust the angle of each joint obtain the posture of the end effector adjustment, continue to determine whether the end position of the end effector is the target pose, and so on, until the end effector poses the target pose, thus according to the limit.
- the combination of the learning machine model and the stability constraints based on Lyapunov's theorem ensures the stability of the robot's simulation learning, the reproducibility and the speed of the model training, and effectively improves the humanization of the robot movement.
- Embodiment 2 is a diagrammatic representation of Embodiment 1:
- FIG. 2 is a flowchart showing an implementation process of collecting a training sample set and a training dynamic prediction model in a simulated learning method of a robot according to Embodiment 2 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown. as follows:
- step S201 the pose of the end effector is acquired on each teaching track of the end effector according to a preset sampling time interval during teaching.
- the teaching action may be given by the teacher or the user during the teaching process, and the end effector moves according to the teaching action, and the robot itself or the external motion capture device according to the preset sampling time interval.
- the teaching mode in the teaching process is not limited.
- the teacher can manipulate the robot through the remote controller or the teach pendant to give a teaching action, or by grasping
- the end effector moves a trajectory in the plane or space to give a teaching action, and can also collect the teaching action by the data glove by wearing the data glove to complete the motion task.
- step S202 according to the sampling time interval and the pose at each sampling point of the end effector, the speed at each sampling point of the end effector is calculated, and the pose and speed combination at each sampling point of the end effector are combined. Train the training samples of the sample set.
- the velocity at each sample point of the end effector can be calculated, as an example, the speed at each sample point of the end effector
- the calculation formula can be expressed as:
- ⁇ t is the preset sampling interval
- the velocity at the kth sample point on the ith trace of the end effector is combined to form a training sample of the training sample set, and the training sample can be expressed as
- step S203 an extreme learning machine model is constructed, and the input and target output of the extreme learning machine model are initialized according to the training sample set acquired during the preset teaching process.
- the extreme learning machine model is a special forward neural network model, which is special in that it only contains one hidden layer, and the number, weight and offset of neurons in the hidden layer are randomly determined.
- the extreme learning machine model is adopted as the dynamic prediction model of the robot simulation learning. In the case of large-scale training data, good training results can be obtained, and stability constraints can be added to the extreme learning machine model.
- an extreme learning machine model is constructed, and the extreme learning machine model can be expressed as:
- ⁇ ( ⁇ 1 ,..., ⁇ i ,..., ⁇ N ) is the weight of the output layer in the network model of the extreme learning machine
- x, g (x) is the input and activation function of the extreme learning machine model respectively.
- the activation function can be a sigmoid function (sigmoid function) or a hyperbolic tangent function (tanh function), and the activation function is not limited here.
- the pose of the end effector in the training sample set training sample is set as the input of the extreme learning machine model, and the speed of the end effector in the training sample is set as the target output of the extreme learning machine model, thereby
- the optimization goal of obtaining the extreme learning machine model is:
- O is the speed of the end effector in the training sample and also the target output of the extreme learning machine model.
- a stability constraint condition is constructed according to the preset Lyapunov theorem, and the stability constraint condition includes a global asymptotically stable constraint condition and a local asymptotically stable constraint condition.
- a stability constraint condition suitable for the extreme learning machine model is derived based on the Lyapunov theorem, and the stability constraint condition is restricted by the condition in the extreme learning machine model, so that the ultimate learning of the training is obtained.
- the machine model ensures that the robot mimics the stability of learning.
- Stability constraints include global asymptotically stable constraints and locally asymptotically stable constraints. Globally asymptotically stable constraints can be expressed as:
- step S205 the extreme learning machine model is supervised according to stability constraints. Training, setting the trained extreme learning machine model as a dynamic prediction model.
- the optimization goal of the extreme learning machine model Optimization is performed to obtain a set of output layer weights ⁇ that satisfy the stability constraints and optimize the optimization goals.
- the optimization target can be optimized by least squares Solve and get Stability constraint Constraints are made, where H + is the Moore-Penrose generalized inverse matrix of matrix H.
- the trained extreme learning machine model is the trained dynamic prediction model.
- an extreme learning machine model is constructed, and a stability constraint condition suitable for the extreme learning machine model is derived based on the Lyapunov theorem.
- the limit is The learning machine model is trained, and the trained extreme learning machine model is the trained dynamic prediction model, which effectively improves the model training speed of the robot simulation learning, and at the same time ensures the stability and recurrence precision of the robot simulation learning.
- Embodiment 3 is a diagrammatic representation of Embodiment 3
- FIG. 3 is a diagram showing the structure of an imitation learning device for a robot according to Embodiment 3 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, including:
- the posture obtaining unit 31 is configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction.
- the robot when receiving the motion or movement instruction sent by the user or the control system, the robot can acquire the joint angle of each joint, and then calculate the posture of the current moment of the end effector according to the joint angle and the positive kinematics.
- the position sensor can directly obtain the position of the end effector at the current moment.
- the posture determining unit 32 is configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task; otherwise, according to the current posture and pre-training
- the dynamic prediction model generates the predicted pose of the end effector at the next moment.
- the dynamic predictive model is trained by the pre-built extreme learning machine model combined with the preset stability constraints.
- the pose of the current moment is a preset target pose, and yes, it can be considered that the end effector successfully imitates the motion characteristics of the human, converges to the target point, and determines the end effector.
- the preset imitation learning task is completed, otherwise the posture of the end effector needs to be adjusted until the end effector's pose is the target pose.
- the pre-trained dynamic prediction model is used to predict the change of the current state of the end effector according to the current state of the end effector. Therefore, after inputting the pose of the current moment of the end effector into the dynamic prediction model, the dynamic prediction model output can be obtained.
- the calculation formula can be expressed as:
- ⁇ t is the preset sampling time interval.
- the extreme learning machine model is constructed in advance in the training process of the dynamic prediction model, and the stability constraint condition corresponding to the extreme learning machine model is constructed according to the Lyapunov theorem, and the stability constraint condition is combined with the stability learning condition.
- the machine model is supervised and trained.
- the trained extreme learning machine model is the trained dynamic prediction model, which combines the stability constraints of the extreme learning machine and the Lyapunov theorem to effectively guarantee the robot simulation learning. Stability, reproducibility and model training speed.
- the training samples for the training of the extreme learning machine model are collected in the teaching process of the user.
- the training process of the training sample and the training process of the dynamic prediction model can be referred to the detailed description of the corresponding unit in the fourth embodiment, and details are not described herein.
- the motion adjusting unit 33 is configured to adjust the joint angle of each joint according to the predicted posture of the next moment, and obtain the posture after the end effector adjustment.
- the end effector after obtaining the predicted pose of the end effector at the next moment, the end effector can be moved from the current pose to the predicted pose by the inverse kinematics calculation, and the joints of the current robot need to change respectively.
- the angle, and thus the joint angle of each joint of the robot, due to the error and precision in the adjustment process, the position and the predicted pose of the end effector are different, and can be calculated by the forward kinematics according to the angle of each joint after the robot adjusts. Get the position of the end effector adjusted.
- a posture setting unit 34 configured to set the adjusted pose as the pose of the current moment, and to be posed
- the judging unit 32 performs an operation of detecting whether the pose of the current time is the preset target pose.
- the position adjusted by the end effector is set to the posture of the end effector at the current time, and the posture determining unit 32 performs a detection of whether the posture of the current moment of the end effector is the preset target position.
- the posture operation so looping, until the end effect of the end effector is the same as the preset target pose.
- the pose of the current moment of the end effector is not the target pose
- the pose of the current moment is input into the dynamic prediction model, and the predicted pose of the next moment of the end effector is obtained, according to the predicted pose.
- Adjust the angle of each joint obtain the posture of the end effector adjustment, continue to determine whether the end position of the end effector is the target pose, and so on, until the end effector poses the target pose, thus according to the limit.
- the combination of the learning machine model and the stability constraints based on Lyapunov's theorem ensures the stability of the robot's simulation learning, the reproducibility and the speed of the model training, and effectively improves the humanization of the robot movement.
- Embodiment 4 is a diagrammatic representation of Embodiment 4:
- FIG. 4 is a diagram showing the structure of an imitation learning device for a robot according to Embodiment 4 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, including:
- the teaching acquisition unit 41 is configured to collect the pose of the end effector on each teaching track of the end effector according to a preset sampling time interval during teaching.
- the teaching action may be given by the teacher or the user during the teaching process, and the end effector moves according to the teaching action, and the robot itself or the external motion capture device according to the preset sampling time interval.
- the sample generating unit 42 is configured to calculate the velocity at each sampling point of the end effector according to the sampling time interval and the pose at each sampling point of the end effector, and the pose and speed at each sampling point of the end effector Combine the training samples that make up the training sample set.
- the velocity at each sample point of the end effector can be calculated, as an example, the speed at each sample point of the end effector
- the calculation formula can be expressed as:
- ⁇ t is the preset sampling interval
- the velocity at the kth sample point on the ith trace of the end effector is combined to form a training sample of the training sample set, and the training sample can be expressed as
- the model construction unit 43 is configured to construct an extreme learning machine model, and initialize an input and a target output of the extreme learning machine model according to the training sample set collected in the preset teaching process.
- an extreme learning machine model is constructed, and the extreme learning machine model can be expressed as:
- ⁇ ( ⁇ 1 ,..., ⁇ i ,..., ⁇ N ) is the weight of the output layer in the network model of the extreme learning machine, x, g (x)
- the pose of the end effector in the training sample set training sample is set as the input of the extreme learning machine model, and the speed of the end effector in the training sample is set as the target output of the extreme learning machine model, thereby
- the optimization goal of obtaining the extreme learning machine model is:
- O is the speed of the end effector in the training sample and also the target output of the extreme learning machine model.
- the constraint construction unit 44 is configured to construct a stability constraint according to the preset Lyapunov theorem, and the stability constraint includes a global asymptotically stable constraint condition and a local asymptotically stable constraint condition.
- the model suitable for the extreme learning machine model is derived.
- Stability constraints include global asymptotically stable constraints and locally asymptotically stable constraints.
- Globally asymptotically stable constraints can be expressed as:
- the model training unit 45 is configured to perform supervised training on the extreme learning machine model according to the stability constraint condition, and set the trained extreme learning machine model as a dynamic prediction model.
- the optimization goal of the extreme learning machine model Optimization is performed to obtain a set of output layer weights ⁇ that satisfy the stability constraints and optimize the optimization goals.
- the optimization target can be optimized by least squares Solve and get Stability constraint Constraints are made, where H + is the Moore-Penrose generalized inverse matrix of matrix H.
- the trained extreme learning machine model is the trained dynamic prediction model.
- the posture obtaining unit 46 is configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction.
- the robot when receiving the motion or movement instruction sent by the user or the control system, the robot can acquire the joint angle of each joint, and then calculate the posture of the current moment of the end effector according to the joint angle and the positive kinematics.
- the position sensor can directly obtain the position of the end effector at the current moment.
- the posture determining unit 47 is configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task; otherwise, according to the current posture and pre-training Dynamic prediction model, generating the predicted pose of the end effector at the next moment, dynamic prediction model It is trained by a pre-built extreme learning machine model in combination with preset stability constraints.
- the pose of the current moment is a preset target pose, and yes, it can be considered that the end effector successfully imitates the motion characteristics of the human, converges to the target point, and determines that the end effector completes the preset. Imitate the learning task, otherwise the orientation of the end effector needs to be adjusted until the end effector's pose is the target pose.
- the pose of the current moment of the end effector when the pose of the current moment of the end effector is not the target pose, the pose of the current moment of the end effector is input into the dynamic prediction model, and the motion of the current moment of the end effector output by the dynamic predictive model is obtained. speed. According to the pose and motion speed of the end effector at the current moment, the predicted pose of the end effector at the next moment can be calculated.
- the calculation formula can be expressed as:
- ⁇ t is the preset sampling time interval.
- the motion adjusting unit 48 is configured to adjust the joint angle of each joint according to the predicted posture of the next moment, and obtain the posture after the end effector adjustment.
- the end effector after obtaining the predicted pose of the end effector at the next moment, the end effector can be moved from the current pose to the predicted pose by the inverse kinematics calculation, and the joints of the current robot need to change respectively.
- the angle, and thus the joint angle of each joint of the robot, due to the error and precision in the adjustment process, the position and the predicted pose of the end effector are different, and can be calculated by the forward kinematics according to the angle of each joint after the robot adjusts. Get the position of the end effector adjusted.
- the pose setting unit 49 is configured to set the adjusted pose as the pose of the current moment, and the pose determination unit 47 performs an operation of detecting whether the pose of the current moment is the preset target pose.
- the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem.
- the end is dynamically preset by the model.
- the posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy and the model training speed, effectively improving the humanization of the robot movement. degree.
- each unit of the mimicry learning device of the robot may be implemented by a corresponding hardware or software unit, and each unit may be an independent software and hardware unit, or may be integrated into one soft and hardware unit, and there is no need to limit this invention.
- Embodiment 5 is a diagrammatic representation of Embodiment 5:
- Fig. 5 shows the structure of a robot provided in Embodiment 5 of the present invention, and for convenience of explanation, only parts related to the embodiment of the present invention are shown.
- the robot 5 of the embodiment of the present invention includes a processor 50, a memory 51, and a computer program 52 stored in the memory 51 and operable on the processor 50.
- the processor 50 when executing the computer program 52, implements the steps in the various method embodiments described above, such as steps S101 through S106 shown in FIG.
- processor 50 when executing computer program 52, implements the functions of the various units of the various apparatus embodiments described above, such as the functions of units 31 through 34 shown in FIG.
- the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem.
- the end is dynamically preset by the model.
- the posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy and the model training speed, effectively improving the humanization of the robot movement. degree.
- a computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the steps in the foregoing method embodiments, for example, FIG. Steps S101 to S106 are shown.
- the computer program when executed by the processor, implements the functions of the various units in the various apparatus embodiments described above, such as the functions of units 31 through 34 shown in FIG.
- the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem.
- the end is dynamically preset by the model.
- the posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability, reproducibility and mode of the robot simulation learning.
- the training speed effectively improves the humanization of robot movement.
- the computer readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program code, a recording medium such as a ROM/RAM, a magnetic disk, an optical disk, a flash memory, or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Manipulator (AREA)
Abstract
Description
Claims (10)
- 一种机器人的模仿学习方法,其特征在于,所述方法包括下述步骤:A method for simulating learning of a robot, characterized in that the method comprises the following steps:当接收到预设的运动指令时,获取末端执行器当前时刻的位姿;Obtaining the pose of the current moment of the end effector when receiving the preset motion instruction;检测所述当前时刻的位姿是否为预设的目标位姿,是则,确定所述末端执行器完成预设的模仿学习任务,否则,根据所述当前时刻的位姿和预先训练好的动态预测模型,生成所述末端执行器下一时刻的预测位姿,所述动态预测模型由预先构建的极限学习机模型结合预设的稳定性约束条件训练得到;Detecting whether the pose of the current moment is a preset target pose, and determining that the end effector completes the preset imitation learning task, otherwise, according to the pose of the current moment and the pre-trained dynamic Predicting a model, generating a predicted pose of the end effector at a next moment, the dynamic predictive model being trained by a pre-built extreme learning machine model in combination with a preset stability constraint;根据所述下一时刻的预测位姿,调节各个关节的关节角,获取所述末端执行器调节后的位姿;Adjusting a joint angle of each joint according to the predicted pose of the next moment, and obtaining a posture after the end effector is adjusted;将所述调节后的位姿设置为所述当前时刻的位姿,并跳转至检测所述当前时刻的位姿是否为预设的目标位姿的步骤。And setting the adjusted pose to the pose of the current moment, and jumping to a step of detecting whether the pose of the current moment is a preset target pose.
- 如权利要求1所述的方法,其特征在于,当接收到预设的运动指令时,获取末端执行器当前时刻的位姿步骤之前,所述方法还包括:The method of claim 1, wherein the method further comprises: before the step of acquiring the current position of the end effector, when the preset motion command is received, the method further comprising:构建所述极限学习机模型,并根据在预设的示教过程中采集的训练样本集,初始化所述极限学习机模型的输入和目标输出;Constructing the extreme learning machine model, and initializing the input and target output of the extreme learning machine model according to the training sample set collected in the preset teaching process;根据预设的李雅普诺夫定理,构建所述稳定性约束条件,所述稳定性约束条件包括全局渐近稳定的约束条件和局部渐近稳定的约束条件;The stability constraint is constructed according to a preset Lyapunov theorem, and the stability constraint includes a global asymptotically stable constraint and a locally asymptotically stable constraint;根据所述稳定性约束条件,对所述极限学习机模型进行有监督的训练,将训练好的所述极限学习机模型设置为所述动态预测模型。According to the stability constraint, the extreme learning machine model is supervised and the trained extreme learning machine model is set as the dynamic prediction model.
- 如权利要求2所述的方法,其特征在于,构建所述极限学习机模型的步骤之前,所述方法还包括:The method of claim 2, wherein before the step of constructing the extreme learning machine model, the method further comprises:在所述示教过程中按照预设的采样时间间隔,在所述末端执行器的每条示教轨迹上采集所述末端执行器的位姿;Collecting a pose of the end effector on each teaching track of the end effector according to a preset sampling time interval during the teaching process;根据所述采样时间间隔和所述末端执行器每个采样点处的位姿,计算所述末端执行器每个采样点处的速度,将所述末端执行器每个采样点处的位姿、速度组合构成所述训练样本集的训练样本。 Calculating a velocity at each sampling point of the end effector according to the sampling time interval and a pose at each sampling point of the end effector, and placing a pose at each sampling point of the end effector, The speed combination constitutes a training sample of the training sample set.
- 如权利要求3所述的方法,其特征在于,构建所述极限学习机模型,并根据在预设的示教过程中采集的训练样本集,初始化所述极限学习机模型的输入和目标输出的步骤,包括:The method of claim 3, wherein the extreme learning machine model is constructed, and the input and target output of the extreme learning machine model are initialized according to a training sample set acquired during a preset teaching process Steps, including:构建所述极限学习机模型,所述极限学习机模型表示为:Constructing the extreme learning machine model, the extreme learning machine model expressed as:其中,所述和分别为所述极限学习机模型中隐藏层的神经元个数、偏置和权重,所述β=(β1,…,βN)为所述极限学习机网络模型中输出层的权重,所述x、g(x)分别为所述极限学习机模型的输入和激活函数; Wherein said with Respectively, the number, offset, and weight of neurons in the hidden layer in the extreme learning machine model, wherein β=(β 1 , . . . , β N ) is the weight of the output layer in the network model of the extreme learning machine. The x and g(x) are respectively the input and activation functions of the extreme learning machine model;将所述训练样本集的训练样本中所述末端执行器的位姿和所述末端执行器的速度分别设置为所述极限学习机模型的输入和目标输出,以获得所述极限学习机模型的优化目标,所述优化目标表示为:Setting a pose of the end effector and a speed of the end effector in a training sample of the training sample set to an input and a target output of the extreme learning machine model, respectively, to obtain the extreme learning machine model Optimization goal, expressed as:
- 如权利要求2所述的方法,其特征在于,根据预设的李雅普诺夫定理,构建所述稳定性约束条件的步骤,包括:The method of claim 2, wherein the step of constructing the stability constraint according to a preset Lyapunov theorem comprises:根据所述李雅普诺夫定理,构建所述全局渐近稳定的约束条件,所述全局渐近稳定的约束条件为:Constructing the global asymptotically stable constraint according to the Lyapunov theorem, the global asymptotically stable constraint condition is:对于且Φi的所述个特征值中存在d个特征值线性无关,其中,所述 for And Φ i There are d eigenvalues linearly irrelevant among the eigenvalues, wherein根据所述李雅普诺夫定理,构建所述局部渐近稳定的约束条件,所述局部渐近稳定的约束条件为:Constructing the locally asymptotically stable constraint according to the Lyapunov theorem, the local asymptotically stable constraint condition is:
- 一种机器人的模仿学习装置,其特征在于,所述装置包括: An imitation learning device for a robot, characterized in that the device comprises:位姿获取单元,用于当接收到预设的运动指令时,获取末端执行器当前时刻的位姿;a posture acquiring unit, configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction;位姿判断单元,用于检测所述当前时刻的位姿是否为预设的目标位姿,是则,确定所述末端执行器完成预设的模仿学习任务,否则,根据所述当前时刻的位姿和预先训练好的动态预测模型,生成所述末端执行器下一时刻的预测位姿,所述动态预测模型由预先构建的极限学习机模型结合预设的稳定性约束条件训练得到;a posture determining unit, configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task, otherwise, according to the current moment And a pre-trained dynamic prediction model, generating a predicted pose of the end effector at a next moment, the dynamic predictive model being trained by a pre-built extreme learning machine model in combination with a preset stability constraint;运动调节单元,用于根据所述下一时刻的预测位姿,调节各个关节的关节角,获取所述末端执行器调节后的位姿;以及a motion adjustment unit, configured to adjust a joint angle of each joint according to the predicted pose of the next moment, and obtain a posture after the end effector is adjusted;位姿设置单元,用于将所述调节后的位姿设置为所述当前时刻的位姿,并由所述位姿判断单元执行检测所述当前时刻的位姿是否为预设的目标位姿的操作。a posture setting unit, configured to set the adjusted pose as a pose of the current moment, and perform, by the pose determination unit, whether the pose of the current moment is a preset target pose Operation.
- 如权利要求6所述的装置,其特征在于,所述装置还包括:The device of claim 6 wherein said device further comprises:模型构建单元,用于构建所述极限学习机模型,并根据在预设的示教过程中采集的训练样本集,初始化所述极限学习机模型的输入和目标输出;a model building unit, configured to construct the extreme learning machine model, and initialize an input and a target output of the extreme learning machine model according to a training sample set collected in a preset teaching process;约束构建单元,用于根据预设的李雅普诺夫定理,构建所述稳定性约束条件,所述稳定性约束条件包括全局渐近稳定的约束条件和局部渐近稳定的约束条件;以及a constraint building unit, configured to construct the stability constraint according to a preset Lyapunov theorem, the stability constraint including a global asymptotically stable constraint and a locally asymptotically stable constraint;模型训练单元,用于根据所述稳定性约束条件,对所述极限学习机模型进行有监督的训练,将训练好的所述极限学习机模型设置为所述动态预测模型。And a model training unit, configured to perform supervised training on the extreme learning machine model according to the stability constraint condition, and set the trained extreme learning machine model as the dynamic prediction model.
- 如权利要求7所述的装置,其特征在于,所述装置还包括:The device of claim 7 wherein said device further comprises:示教采集单元,用于在所述示教过程中按照预设的采样时间间隔,在所述末端执行器的每条示教轨迹上采集所述末端执行器的位姿;以及a teaching acquisition unit, configured to acquire a pose of the end effector on each teaching track of the end effector according to a preset sampling time interval during the teaching; and样本生成单元,用于根据所述采样时间间隔和所述末端执行器每个采样点处的位姿,计算所述末端执行器每个采样点处的速度,将所述末端执行器每个采样点处的位姿、速度组合构成所述训练样本集的训练样本。 a sample generating unit, configured to calculate a velocity at each sampling point of the end effector according to the sampling time interval and a pose at each sampling point of the end effector, and sample the end effector each sampling The pose and velocity combination at the point constitutes a training sample of the training sample set.
- 一种机器人,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至5任一项所述方法的步骤。A robot comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor executes the computer program as claimed in claims 1 to 5 The steps of any of the methods described.
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至5任一项所述方法的步骤。 A computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/110923 WO2019095108A1 (en) | 2017-11-14 | 2017-11-14 | Robot imitation learning method and apparatus, robot and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/110923 WO2019095108A1 (en) | 2017-11-14 | 2017-11-14 | Robot imitation learning method and apparatus, robot and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019095108A1 true WO2019095108A1 (en) | 2019-05-23 |
Family
ID=66539209
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/110923 WO2019095108A1 (en) | 2017-11-14 | 2017-11-14 | Robot imitation learning method and apparatus, robot and storage medium |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2019095108A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112535474A (en) * | 2020-11-11 | 2021-03-23 | 西安交通大学 | Lower limb movement joint angle real-time prediction method based on similar rule search |
CN113442145A (en) * | 2021-09-01 | 2021-09-28 | 北京柏惠维康科技有限公司 | Optimal pose determining method and device under constraint, storage medium and mechanical arm |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002040224A1 (en) * | 2000-11-17 | 2002-05-23 | Honda Giken Kogyo Kabushiki Kaisha | Gait pattern generating device for legged mobile robot |
KR20120077681A (en) * | 2010-12-31 | 2012-07-10 | 강원대학교산학협력단 | Intelligent robot apparatus and method for adaptively customizing according to a command |
CN102825603A (en) * | 2012-09-10 | 2012-12-19 | 江苏科技大学 | Network teleoperation robot system and time delay overcoming method |
CN106020190A (en) * | 2016-05-26 | 2016-10-12 | 山东大学 | Track learning controller, control system and method with initial state error correction |
CN106346477A (en) * | 2016-11-05 | 2017-01-25 | 上海新时达电气股份有限公司 | Method and module for distinguishing load of six-axis robot |
CN106573370A (en) * | 2014-04-17 | 2017-04-19 | 软银机器人欧洲公司 | Omnidirectional wheeled humanoid robot based on a linear predictive position and velocity controller |
CN106774327A (en) * | 2016-12-23 | 2017-05-31 | 中新智擎有限公司 | A kind of robot path planning method and device |
-
2017
- 2017-11-14 WO PCT/CN2017/110923 patent/WO2019095108A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002040224A1 (en) * | 2000-11-17 | 2002-05-23 | Honda Giken Kogyo Kabushiki Kaisha | Gait pattern generating device for legged mobile robot |
KR20120077681A (en) * | 2010-12-31 | 2012-07-10 | 강원대학교산학협력단 | Intelligent robot apparatus and method for adaptively customizing according to a command |
CN102825603A (en) * | 2012-09-10 | 2012-12-19 | 江苏科技大学 | Network teleoperation robot system and time delay overcoming method |
CN106573370A (en) * | 2014-04-17 | 2017-04-19 | 软银机器人欧洲公司 | Omnidirectional wheeled humanoid robot based on a linear predictive position and velocity controller |
CN106020190A (en) * | 2016-05-26 | 2016-10-12 | 山东大学 | Track learning controller, control system and method with initial state error correction |
CN106346477A (en) * | 2016-11-05 | 2017-01-25 | 上海新时达电气股份有限公司 | Method and module for distinguishing load of six-axis robot |
CN106774327A (en) * | 2016-12-23 | 2017-05-31 | 中新智擎有限公司 | A kind of robot path planning method and device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112535474A (en) * | 2020-11-11 | 2021-03-23 | 西安交通大学 | Lower limb movement joint angle real-time prediction method based on similar rule search |
CN112535474B (en) * | 2020-11-11 | 2021-12-28 | 西安交通大学 | Lower limb movement joint angle real-time prediction method based on similar rule search |
CN113442145A (en) * | 2021-09-01 | 2021-09-28 | 北京柏惠维康科技有限公司 | Optimal pose determining method and device under constraint, storage medium and mechanical arm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108115681B (en) | Simulation learning method and device for robot, robot and storage medium | |
US11529733B2 (en) | Method and system for robot action imitation learning in three-dimensional space | |
Hasan et al. | Artificial neural network-based kinematics Jacobian solution for serial manipulator passing through singular configurations | |
Kolev et al. | Physically consistent state estimation and system identification for contacts | |
US8078321B2 (en) | Behavior control system | |
US8660699B2 (en) | Behavior control system and robot | |
Koutras et al. | A correct formulation for the orientation dynamic movement primitives for robot control in the cartesian space | |
Fang et al. | Skill learning for human-robot interaction using wearable device | |
WO2020118730A1 (en) | Compliance control method and apparatus for robot, device, and storage medium | |
Ott et al. | Kinesthetic teaching of humanoid motion based on whole-body compliance control with interaction-aware balancing | |
CN114102600B (en) | Multi-space fusion human-machine skill migration and parameter compensation method and system | |
Um et al. | Independent joint learning: A novel task-to-task transfer learning scheme for robot models | |
Jetchev et al. | Task space retrieval using inverse feedback control | |
WO2019095108A1 (en) | Robot imitation learning method and apparatus, robot and storage medium | |
Liu et al. | Modeling and control of robotic manipulators based on artificial neural networks: a review | |
Khadivar et al. | Adaptive fingers coordination for robust grasp and in-hand manipulation under disturbances and unknown dynamics | |
Petrič et al. | Online approach for altering robot behaviors based on human in the loop coaching gestures | |
Yan et al. | Hierarchical policy learning with demonstration learning for robotic multiple peg-in-hole assembly tasks | |
Michieletto et al. | Robot learning by observing humans activities and modeling failures | |
Yamane | Kinematic redundancy resolution for humanoid robots by human motion database | |
Monforte et al. | Multifunctional principal component analysis for human-like grasping | |
Zhu | Robot Learning Assembly Tasks from Human Demonstrations | |
Fang et al. | Learning from wearable-based teleoperation demonstration | |
Wei et al. | Robotic skills learning based on dynamical movement primitives using a wearable device | |
Helin et al. | Omnidirectional walking based on preview control for biped robots |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17932286 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17932286 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22/09/2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17932286 Country of ref document: EP Kind code of ref document: A1 |