WO2019095108A1 - Robot imitation learning method and apparatus, robot and storage medium - Google Patents

Robot imitation learning method and apparatus, robot and storage medium Download PDF

Info

Publication number
WO2019095108A1
WO2019095108A1 PCT/CN2017/110923 CN2017110923W WO2019095108A1 WO 2019095108 A1 WO2019095108 A1 WO 2019095108A1 CN 2017110923 W CN2017110923 W CN 2017110923W WO 2019095108 A1 WO2019095108 A1 WO 2019095108A1
Authority
WO
WIPO (PCT)
Prior art keywords
pose
end effector
learning machine
preset
extreme learning
Prior art date
Application number
PCT/CN2017/110923
Other languages
French (fr)
Chinese (zh)
Inventor
欧勇盛
王志扬
段江哗
金少堃
徐升
熊荣
吴新宇
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Priority to PCT/CN2017/110923 priority Critical patent/WO2019095108A1/en
Publication of WO2019095108A1 publication Critical patent/WO2019095108A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls

Definitions

  • the invention belongs to the technical field of robots and intelligent control, and in particular relates to a simulation learning method, device, robot and storage medium for a robot.
  • the user usually pre-defines the movement trajectory of the robot arm, or presets a certain task environment, so that the robot arm can be repeatedly executed according to the plan.
  • the robotic arm can't face the change of the task environment or the sudden disturbance, or it requires more manual programming to realize the task in the complex scene or the more difficult task. More importantly, the movement of the robot arm There is no hidden human operating habits. Robotic imitation learning is an important way to solve these problems.
  • An object of the present invention is to provide an imitation learning method, apparatus, robot and storage medium for a robot, which aims to solve the problem that the stability, recurrence accuracy and model training speed of the robot imitative learning cannot be guaranteed at the same time in the prior art.
  • the present invention provides an imitation learning method for a robot, the method comprising the steps of:
  • the present invention provides an imitation learning device for a robot, the device comprising:
  • a posture acquiring unit configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction
  • a posture determining unit configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task, otherwise, according to the current moment a pre-trained dynamic prediction model that generates a predicted bit at the next moment of the end effector Position, the dynamic prediction model is trained by a pre-built extreme learning machine model in combination with preset stability constraints;
  • a motion adjustment unit configured to adjust a joint angle of each joint according to the predicted pose of the next moment, and obtain a posture after the end effector is adjusted
  • a posture setting unit configured to set the adjusted pose as a pose of the current moment, and perform, by the pose determination unit, to generate a target position for detecting whether the pose of the current moment is a preset target The operation of the posture.
  • the present invention also provides a robot comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor executing the computer program as The steps described above are mimicking the learning method of the robot.
  • the present invention also provides a computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the steps as described above in the mimic learning method of the robot .
  • the invention constructs the extreme learning machine model in advance, and derives the stability constraint condition of the extreme learning machine model, and the dynamic learning model is obtained by the extreme learning machine model combined with the stability constraint condition training, and the end effector is detected when receiving the motion instruction.
  • the pose of the current moment is the target pose, it is determined that the end effector completes the imitation learning task, otherwise, according to the pose and dynamic prediction model of the current moment of the end effector, the predicted pose of the end effector is generated at the next moment, according to The predicted pose adjusts the joint of the end effector and jumps to the step of detecting whether the pose of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy, and the model training speed. , effectively improve the humanity of robot movement.
  • Embodiment 1 is a flowchart of an implementation of an imitation learning method of a robot according to Embodiment 1 of the present invention
  • FIG. 2 is a flowchart of an implementation of collecting a data sample set and a training dynamic prediction model in a simulated learning method of a robot according to Embodiment 2 of the present invention
  • FIG. 3 is a schematic structural diagram of an imitation learning device of a robot according to Embodiment 3 of the present invention.
  • FIG. 4 is a schematic structural diagram of an imitation learning device of a robot according to Embodiment 4 of the present invention.
  • FIG. 5 is a schematic structural diagram of a robot according to Embodiment 5 of the present invention.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • FIG. 1 is a flowchart showing an implementation process of an imitation learning method for a robot according to Embodiment 1 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, which are described in detail as follows:
  • step S101 when a preset motion instruction is received, the pose of the current moment of the end effector is acquired.
  • the embodiments of the present invention are applicable to, but not limited to, a robot having a structure such as a joint or a connecting rod, which can realize an action of stretching, grasping, and the like.
  • the robot can acquire the joint angle of each joint, and then calculate the pose of the end effector according to the joint angle and the positive kinematics, and if the robot itself takes A position sensor having an end effector by which the position of the end effector at the current moment is directly obtained, wherein the pose includes the position and orientation of the end effector.
  • step S102 it is detected whether the pose of the current moment is a preset target pose.
  • step S106 it is detected whether the pose of the current moment of the end effector is a preset target pose, and when the pose of the current moment of the end effector is the target pose, step S106 is performed, otherwise step S103 is performed.
  • step S103 based on the pose of the current time and the pre-trained dynamic prediction model, the predicted pose of the end effector at the next moment is generated, and the dynamic predictive model is constructed by a pre-built extreme learning model.
  • the model is trained in conjunction with preset stability constraints.
  • the pose of the end effector at the current moment when the pose of the end effector at the current moment is not the preset target pose, the pose of the end effector needs to be adjusted.
  • the pre-trained dynamic prediction model is used to predict the change of the current state of the end effector according to the current state of the end effector. Therefore, after inputting the pose of the current moment of the end effector into the dynamic prediction model, the dynamic prediction model output can be obtained.
  • the speed at which the end effector is currently moving. According to the pose and motion speed of the end effector at the current moment, the predicted pose of the end effector at the next moment can be calculated.
  • the calculation formula can be expressed as:
  • ⁇ t is the preset sampling time interval.
  • the extreme learning machine model is constructed in advance in the training process of the dynamic prediction model, and the stability constraint condition corresponding to the extreme learning machine model is constructed according to the Lyapunov theorem, and the stability constraint condition is combined with the stability learning condition.
  • the machine model is supervised and trained.
  • the trained extreme learning machine model is the trained dynamic prediction model, which combines the stability constraints of the extreme learning machine and the Lyapunov theorem to effectively guarantee the robot simulation learning. Stability, reproducibility and model training speed.
  • the training samples for the training of the extreme learning machine model are collected in the teaching process of the user.
  • the training process of the training sample and the training process of the dynamic prediction model can be referred to the detailed description of each step in the second embodiment, and details are not described herein.
  • step S104 the joint angle of each joint is adjusted based on the predicted pose of the next moment, and the pose after adjustment by the end effector is acquired.
  • the end effector after obtaining the predicted pose of the end effector at the next moment, the end effector can be moved from the current pose to the predicted pose by the inverse kinematics calculation, and the joints of the current robot need to change respectively.
  • the angle, and thus the joint angle of each joint of the robot, due to the error and precision in the adjustment process, the position and the predicted pose of the end effector are different, and can be calculated by the forward kinematics according to the angle of each joint after the robot adjusts. Get the position of the end effector adjusted.
  • step S105 the adjusted pose is set as the pose of the current time.
  • the position adjusted by the end effector is set to the posture of the current moment of the end effector, and jumps to whether the posture of detecting the current moment of the end effector is the preset target position in step S102.
  • the posture operation so looping, until the end effect of the end effector is the same as the preset target pose.
  • step S106 it is determined that the end effector completes the preset imitation learning task.
  • the end effector when the positional posture adjusted by the end effector is the target pose, the end effector can be considered to successfully imitate the motion characteristics of the human, converge to the target point, and determine that the end effector completes the preset imitation learning. task.
  • the pose of the current moment of the end effector is not the target pose
  • the pose of the current moment is input into the dynamic prediction model, and the predicted pose of the next moment of the end effector is obtained, according to the predicted pose.
  • Adjust the angle of each joint obtain the posture of the end effector adjustment, continue to determine whether the end position of the end effector is the target pose, and so on, until the end effector poses the target pose, thus according to the limit.
  • the combination of the learning machine model and the stability constraints based on Lyapunov's theorem ensures the stability of the robot's simulation learning, the reproducibility and the speed of the model training, and effectively improves the humanization of the robot movement.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • FIG. 2 is a flowchart showing an implementation process of collecting a training sample set and a training dynamic prediction model in a simulated learning method of a robot according to Embodiment 2 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown. as follows:
  • step S201 the pose of the end effector is acquired on each teaching track of the end effector according to a preset sampling time interval during teaching.
  • the teaching action may be given by the teacher or the user during the teaching process, and the end effector moves according to the teaching action, and the robot itself or the external motion capture device according to the preset sampling time interval.
  • the teaching mode in the teaching process is not limited.
  • the teacher can manipulate the robot through the remote controller or the teach pendant to give a teaching action, or by grasping
  • the end effector moves a trajectory in the plane or space to give a teaching action, and can also collect the teaching action by the data glove by wearing the data glove to complete the motion task.
  • step S202 according to the sampling time interval and the pose at each sampling point of the end effector, the speed at each sampling point of the end effector is calculated, and the pose and speed combination at each sampling point of the end effector are combined. Train the training samples of the sample set.
  • the velocity at each sample point of the end effector can be calculated, as an example, the speed at each sample point of the end effector
  • the calculation formula can be expressed as:
  • ⁇ t is the preset sampling interval
  • the velocity at the kth sample point on the ith trace of the end effector is combined to form a training sample of the training sample set, and the training sample can be expressed as
  • step S203 an extreme learning machine model is constructed, and the input and target output of the extreme learning machine model are initialized according to the training sample set acquired during the preset teaching process.
  • the extreme learning machine model is a special forward neural network model, which is special in that it only contains one hidden layer, and the number, weight and offset of neurons in the hidden layer are randomly determined.
  • the extreme learning machine model is adopted as the dynamic prediction model of the robot simulation learning. In the case of large-scale training data, good training results can be obtained, and stability constraints can be added to the extreme learning machine model.
  • an extreme learning machine model is constructed, and the extreme learning machine model can be expressed as:
  • ( ⁇ 1 ,..., ⁇ i ,..., ⁇ N ) is the weight of the output layer in the network model of the extreme learning machine
  • x, g (x) is the input and activation function of the extreme learning machine model respectively.
  • the activation function can be a sigmoid function (sigmoid function) or a hyperbolic tangent function (tanh function), and the activation function is not limited here.
  • the pose of the end effector in the training sample set training sample is set as the input of the extreme learning machine model, and the speed of the end effector in the training sample is set as the target output of the extreme learning machine model, thereby
  • the optimization goal of obtaining the extreme learning machine model is:
  • O is the speed of the end effector in the training sample and also the target output of the extreme learning machine model.
  • a stability constraint condition is constructed according to the preset Lyapunov theorem, and the stability constraint condition includes a global asymptotically stable constraint condition and a local asymptotically stable constraint condition.
  • a stability constraint condition suitable for the extreme learning machine model is derived based on the Lyapunov theorem, and the stability constraint condition is restricted by the condition in the extreme learning machine model, so that the ultimate learning of the training is obtained.
  • the machine model ensures that the robot mimics the stability of learning.
  • Stability constraints include global asymptotically stable constraints and locally asymptotically stable constraints. Globally asymptotically stable constraints can be expressed as:
  • step S205 the extreme learning machine model is supervised according to stability constraints. Training, setting the trained extreme learning machine model as a dynamic prediction model.
  • the optimization goal of the extreme learning machine model Optimization is performed to obtain a set of output layer weights ⁇ that satisfy the stability constraints and optimize the optimization goals.
  • the optimization target can be optimized by least squares Solve and get Stability constraint Constraints are made, where H + is the Moore-Penrose generalized inverse matrix of matrix H.
  • the trained extreme learning machine model is the trained dynamic prediction model.
  • an extreme learning machine model is constructed, and a stability constraint condition suitable for the extreme learning machine model is derived based on the Lyapunov theorem.
  • the limit is The learning machine model is trained, and the trained extreme learning machine model is the trained dynamic prediction model, which effectively improves the model training speed of the robot simulation learning, and at the same time ensures the stability and recurrence precision of the robot simulation learning.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • FIG. 3 is a diagram showing the structure of an imitation learning device for a robot according to Embodiment 3 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, including:
  • the posture obtaining unit 31 is configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction.
  • the robot when receiving the motion or movement instruction sent by the user or the control system, the robot can acquire the joint angle of each joint, and then calculate the posture of the current moment of the end effector according to the joint angle and the positive kinematics.
  • the position sensor can directly obtain the position of the end effector at the current moment.
  • the posture determining unit 32 is configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task; otherwise, according to the current posture and pre-training
  • the dynamic prediction model generates the predicted pose of the end effector at the next moment.
  • the dynamic predictive model is trained by the pre-built extreme learning machine model combined with the preset stability constraints.
  • the pose of the current moment is a preset target pose, and yes, it can be considered that the end effector successfully imitates the motion characteristics of the human, converges to the target point, and determines the end effector.
  • the preset imitation learning task is completed, otherwise the posture of the end effector needs to be adjusted until the end effector's pose is the target pose.
  • the pre-trained dynamic prediction model is used to predict the change of the current state of the end effector according to the current state of the end effector. Therefore, after inputting the pose of the current moment of the end effector into the dynamic prediction model, the dynamic prediction model output can be obtained.
  • the calculation formula can be expressed as:
  • ⁇ t is the preset sampling time interval.
  • the extreme learning machine model is constructed in advance in the training process of the dynamic prediction model, and the stability constraint condition corresponding to the extreme learning machine model is constructed according to the Lyapunov theorem, and the stability constraint condition is combined with the stability learning condition.
  • the machine model is supervised and trained.
  • the trained extreme learning machine model is the trained dynamic prediction model, which combines the stability constraints of the extreme learning machine and the Lyapunov theorem to effectively guarantee the robot simulation learning. Stability, reproducibility and model training speed.
  • the training samples for the training of the extreme learning machine model are collected in the teaching process of the user.
  • the training process of the training sample and the training process of the dynamic prediction model can be referred to the detailed description of the corresponding unit in the fourth embodiment, and details are not described herein.
  • the motion adjusting unit 33 is configured to adjust the joint angle of each joint according to the predicted posture of the next moment, and obtain the posture after the end effector adjustment.
  • the end effector after obtaining the predicted pose of the end effector at the next moment, the end effector can be moved from the current pose to the predicted pose by the inverse kinematics calculation, and the joints of the current robot need to change respectively.
  • the angle, and thus the joint angle of each joint of the robot, due to the error and precision in the adjustment process, the position and the predicted pose of the end effector are different, and can be calculated by the forward kinematics according to the angle of each joint after the robot adjusts. Get the position of the end effector adjusted.
  • a posture setting unit 34 configured to set the adjusted pose as the pose of the current moment, and to be posed
  • the judging unit 32 performs an operation of detecting whether the pose of the current time is the preset target pose.
  • the position adjusted by the end effector is set to the posture of the end effector at the current time, and the posture determining unit 32 performs a detection of whether the posture of the current moment of the end effector is the preset target position.
  • the posture operation so looping, until the end effect of the end effector is the same as the preset target pose.
  • the pose of the current moment of the end effector is not the target pose
  • the pose of the current moment is input into the dynamic prediction model, and the predicted pose of the next moment of the end effector is obtained, according to the predicted pose.
  • Adjust the angle of each joint obtain the posture of the end effector adjustment, continue to determine whether the end position of the end effector is the target pose, and so on, until the end effector poses the target pose, thus according to the limit.
  • the combination of the learning machine model and the stability constraints based on Lyapunov's theorem ensures the stability of the robot's simulation learning, the reproducibility and the speed of the model training, and effectively improves the humanization of the robot movement.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • FIG. 4 is a diagram showing the structure of an imitation learning device for a robot according to Embodiment 4 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, including:
  • the teaching acquisition unit 41 is configured to collect the pose of the end effector on each teaching track of the end effector according to a preset sampling time interval during teaching.
  • the teaching action may be given by the teacher or the user during the teaching process, and the end effector moves according to the teaching action, and the robot itself or the external motion capture device according to the preset sampling time interval.
  • the sample generating unit 42 is configured to calculate the velocity at each sampling point of the end effector according to the sampling time interval and the pose at each sampling point of the end effector, and the pose and speed at each sampling point of the end effector Combine the training samples that make up the training sample set.
  • the velocity at each sample point of the end effector can be calculated, as an example, the speed at each sample point of the end effector
  • the calculation formula can be expressed as:
  • ⁇ t is the preset sampling interval
  • the velocity at the kth sample point on the ith trace of the end effector is combined to form a training sample of the training sample set, and the training sample can be expressed as
  • the model construction unit 43 is configured to construct an extreme learning machine model, and initialize an input and a target output of the extreme learning machine model according to the training sample set collected in the preset teaching process.
  • an extreme learning machine model is constructed, and the extreme learning machine model can be expressed as:
  • ( ⁇ 1 ,..., ⁇ i ,..., ⁇ N ) is the weight of the output layer in the network model of the extreme learning machine, x, g (x)
  • the pose of the end effector in the training sample set training sample is set as the input of the extreme learning machine model, and the speed of the end effector in the training sample is set as the target output of the extreme learning machine model, thereby
  • the optimization goal of obtaining the extreme learning machine model is:
  • O is the speed of the end effector in the training sample and also the target output of the extreme learning machine model.
  • the constraint construction unit 44 is configured to construct a stability constraint according to the preset Lyapunov theorem, and the stability constraint includes a global asymptotically stable constraint condition and a local asymptotically stable constraint condition.
  • the model suitable for the extreme learning machine model is derived.
  • Stability constraints include global asymptotically stable constraints and locally asymptotically stable constraints.
  • Globally asymptotically stable constraints can be expressed as:
  • the model training unit 45 is configured to perform supervised training on the extreme learning machine model according to the stability constraint condition, and set the trained extreme learning machine model as a dynamic prediction model.
  • the optimization goal of the extreme learning machine model Optimization is performed to obtain a set of output layer weights ⁇ that satisfy the stability constraints and optimize the optimization goals.
  • the optimization target can be optimized by least squares Solve and get Stability constraint Constraints are made, where H + is the Moore-Penrose generalized inverse matrix of matrix H.
  • the trained extreme learning machine model is the trained dynamic prediction model.
  • the posture obtaining unit 46 is configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction.
  • the robot when receiving the motion or movement instruction sent by the user or the control system, the robot can acquire the joint angle of each joint, and then calculate the posture of the current moment of the end effector according to the joint angle and the positive kinematics.
  • the position sensor can directly obtain the position of the end effector at the current moment.
  • the posture determining unit 47 is configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task; otherwise, according to the current posture and pre-training Dynamic prediction model, generating the predicted pose of the end effector at the next moment, dynamic prediction model It is trained by a pre-built extreme learning machine model in combination with preset stability constraints.
  • the pose of the current moment is a preset target pose, and yes, it can be considered that the end effector successfully imitates the motion characteristics of the human, converges to the target point, and determines that the end effector completes the preset. Imitate the learning task, otherwise the orientation of the end effector needs to be adjusted until the end effector's pose is the target pose.
  • the pose of the current moment of the end effector when the pose of the current moment of the end effector is not the target pose, the pose of the current moment of the end effector is input into the dynamic prediction model, and the motion of the current moment of the end effector output by the dynamic predictive model is obtained. speed. According to the pose and motion speed of the end effector at the current moment, the predicted pose of the end effector at the next moment can be calculated.
  • the calculation formula can be expressed as:
  • ⁇ t is the preset sampling time interval.
  • the motion adjusting unit 48 is configured to adjust the joint angle of each joint according to the predicted posture of the next moment, and obtain the posture after the end effector adjustment.
  • the end effector after obtaining the predicted pose of the end effector at the next moment, the end effector can be moved from the current pose to the predicted pose by the inverse kinematics calculation, and the joints of the current robot need to change respectively.
  • the angle, and thus the joint angle of each joint of the robot, due to the error and precision in the adjustment process, the position and the predicted pose of the end effector are different, and can be calculated by the forward kinematics according to the angle of each joint after the robot adjusts. Get the position of the end effector adjusted.
  • the pose setting unit 49 is configured to set the adjusted pose as the pose of the current moment, and the pose determination unit 47 performs an operation of detecting whether the pose of the current moment is the preset target pose.
  • the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem.
  • the end is dynamically preset by the model.
  • the posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy and the model training speed, effectively improving the humanization of the robot movement. degree.
  • each unit of the mimicry learning device of the robot may be implemented by a corresponding hardware or software unit, and each unit may be an independent software and hardware unit, or may be integrated into one soft and hardware unit, and there is no need to limit this invention.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • Fig. 5 shows the structure of a robot provided in Embodiment 5 of the present invention, and for convenience of explanation, only parts related to the embodiment of the present invention are shown.
  • the robot 5 of the embodiment of the present invention includes a processor 50, a memory 51, and a computer program 52 stored in the memory 51 and operable on the processor 50.
  • the processor 50 when executing the computer program 52, implements the steps in the various method embodiments described above, such as steps S101 through S106 shown in FIG.
  • processor 50 when executing computer program 52, implements the functions of the various units of the various apparatus embodiments described above, such as the functions of units 31 through 34 shown in FIG.
  • the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem.
  • the end is dynamically preset by the model.
  • the posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy and the model training speed, effectively improving the humanization of the robot movement. degree.
  • a computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the steps in the foregoing method embodiments, for example, FIG. Steps S101 to S106 are shown.
  • the computer program when executed by the processor, implements the functions of the various units in the various apparatus embodiments described above, such as the functions of units 31 through 34 shown in FIG.
  • the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem.
  • the end is dynamically preset by the model.
  • the posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability, reproducibility and mode of the robot simulation learning.
  • the training speed effectively improves the humanization of robot movement.
  • the computer readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program code, a recording medium such as a ROM/RAM, a magnetic disk, an optical disk, a flash memory, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Manipulator (AREA)

Abstract

A robot imitation learning method and apparatus, a robot and a storage medium. Said method comprises: when a motion instruction is received, acquiring the position and posture of an end executor at a current moment, and detecting whether the position and posture of the current moment is a target position and posture; if so, determining that the end executor completes a preset imitation learning task; otherwise, generating, according to the position and posture and a dynamic prediction model, a predicted position and posture of the end executor at a next moment, adjusting the angle of each joint according to the predicted position and posture, setting the position and posture which has been adjusted by the end executor as the position and posture of the current moment, and jumping to the step of detecting whether the position and posture at the current moment is the target position and posture, the dynamic prediction model being obtained by training of an extreme learning machine model in combination with a preset stability constraint condition, ensuring the stability, reproduction accuracy and model training speed of the robot imitation learning, effectively improving the degree of personalization of the robot motion.

Description

机器人的模仿学习方法、装置、机器人及存储介质Robot simulation method, device, robot and storage medium 技术领域Technical field
本发明属于机器人和智能控制技术领域,尤其涉及一种机器人的模仿学习方法、装置、机器人及存储介质。The invention belongs to the technical field of robots and intelligent control, and in particular relates to a simulation learning method, device, robot and storage medium for a robot.
背景技术Background technique
在现阶段机器人应用中,尤其是在机器人的工业应用中,用户通常预先定义机器臂的运动轨迹,或者预先设定某种任务环境,让机械臂按照计划重复执行即可。在这种控制模式下,机械臂无法面对任务环境的变化或者突如其来的扰动,或者需要较为繁重的人工编程才能实现复杂场景下的任务或较困难任务,更重要的是,机械臂的运动轨迹没有隐含人的操作习惯。机器人的模仿学习正是解决这些问题的重要方法。In the current robot application, especially in the industrial application of the robot, the user usually pre-defines the movement trajectory of the robot arm, or presets a certain task environment, so that the robot arm can be repeatedly executed according to the plan. In this control mode, the robotic arm can't face the change of the task environment or the sudden disturbance, or it requires more manual programming to realize the task in the complex scene or the more difficult task. More importantly, the movement of the robot arm There is no hidden human operating habits. Robotic imitation learning is an important way to solve these problems.
在通过模仿学习对机器人运动进行建模时,研究人员通常希望实现以下三个目标:第一,希望机器人能够始终运动到我们所期望的目标,从控制的角度来说,希望系统具有一定的稳定性,即机器人在运动过程中遇到某种时间或空间的干扰而偏离了轨迹时,仍然能够准确地收敛到目标;第二,希望机器人在运动时运动轨迹能够尽可能与之前人类的示教轨迹具有相似的轮廓,即机器人复现时的“精度”;第三,希望尽量减小机器学习方法训练模型参数时所需的时间,也即提高模型训练的“速度”。When modeling robot motion through simulation learning, researchers usually hope to achieve the following three goals: First, we hope that the robot can always move to the desired target. From the control point of view, we hope that the system has certain stability. Sexuality, that is, when the robot encounters some time or space disturbance during the movement and deviates from the trajectory, it can still converge to the target accurately. Second, it is hoped that the trajectory of the robot can be as much as possible with the previous human teaching during the movement. The trajectory has a similar contour, that is, the "precision" of the robot's recurrence. Third, it is desirable to minimize the time required for the machine learning method to train the model parameters, that is, to increase the "speed" of the model training.
“稳定性”、“精度”和“速度”通常是相互牵制和矛盾的关系,实现精度、速度、稳定性三者的最佳权衡是机器人模仿学习的关键。目前,国际上较为著名的机器人模仿学习方法是通过建立一个“动态系统”来对机器人的运动进行建模,“动态系统”最初通过高斯混合模型来建模,并考虑到了稳定性约 束,但由于模型训练较为复杂,无法对“稳定性”、“精度”和“速度”进行有效权衡,国内的机器人模仿学习方法也大多基于高斯混合模型、高斯过程,且未考虑到稳定性问题,同样无法对“稳定性”、“精度”和“速度”三者进行有效权衡。“Stability”, “precision” and “speed” are usually the relationship of mutual restraint and contradiction. The best trade-off between accuracy, speed and stability is the key to robot simulation learning. At present, the most famous robot simulation learning method in the world is to model the motion of the robot by establishing a "dynamic system". The "dynamic system" is initially modeled by the Gaussian mixture model, and the stability is considered. However, due to the complexity of model training, it is impossible to effectively balance "stability", "precision" and "speed". Domestic robot simulation learning methods are mostly based on Gaussian mixture model and Gaussian process, and stability problems are not considered. It is also impossible to effectively balance "stability", "precision" and "speed".
发明内容Summary of the invention
本发明的目的在于提供一种机器人的模仿学习方法、装置、机器人及存储介质,旨在解决由于现有技术中机器人模仿学习的稳定性、复现精度、模型训练速度无法同时得到保证的问题。An object of the present invention is to provide an imitation learning method, apparatus, robot and storage medium for a robot, which aims to solve the problem that the stability, recurrence accuracy and model training speed of the robot imitative learning cannot be guaranteed at the same time in the prior art.
一方面,本发明提供了一种机器人的模仿学习方法,所述方法包括下述步骤:In one aspect, the present invention provides an imitation learning method for a robot, the method comprising the steps of:
当接收到预设的运动指令时,获取末端执行器当前时刻的位姿;Obtaining the pose of the current moment of the end effector when receiving the preset motion instruction;
检测所述当前时刻的位姿是否为预设的目标位姿,是则,确定所述末端执行器完成预设的模仿学习任务,否则,根据所述当前时刻的位姿和预先训练好的动态预测模型,生成所述末端执行器下一时刻的预测位姿,所述动态预测模型由预先构建的极限学习机模型结合预设的稳定性约束条件训练得到;Detecting whether the pose of the current moment is a preset target pose, and determining that the end effector completes the preset imitation learning task, otherwise, according to the pose of the current moment and the pre-trained dynamic Predicting a model, generating a predicted pose of the end effector at a next moment, the dynamic predictive model being trained by a pre-built extreme learning machine model in combination with a preset stability constraint;
根据所述下一时刻的预测位姿,调节各个关节的关节角,获取所述末端执行器调节后的位姿;Adjusting a joint angle of each joint according to the predicted pose of the next moment, and obtaining a posture after the end effector is adjusted;
将所述调节后的位姿设置为所述当前时刻的位姿,并跳转至检测所述当前时刻的位姿是否为预设的目标位姿的步骤。And setting the adjusted pose to the pose of the current moment, and jumping to a step of detecting whether the pose of the current moment is a preset target pose.
另一方面,本发明提供了一种机器人的模仿学习装置,所述装置包括:In another aspect, the present invention provides an imitation learning device for a robot, the device comprising:
位姿获取单元,用于当接收到预设的运动指令时,获取末端执行器当前时刻的位姿;a posture acquiring unit, configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction;
位姿判断单元,用于检测所述当前时刻的位姿是否为预设的目标位姿,是则,确定所述末端执行器完成预设的模仿学习任务,否则,根据所述当前时刻的位姿和预先训练好的动态预测模型,生成所述末端执行器下一时刻的预测位 姿,所述动态预测模型由预先构建的极限学习机模型结合预设的稳定性约束条件训练得到;a posture determining unit, configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task, otherwise, according to the current moment a pre-trained dynamic prediction model that generates a predicted bit at the next moment of the end effector Position, the dynamic prediction model is trained by a pre-built extreme learning machine model in combination with preset stability constraints;
运动调节单元,用于根据所述下一时刻的预测位姿,调节各个关节的关节角,获取所述末端执行器调节后的位姿;以及a motion adjustment unit, configured to adjust a joint angle of each joint according to the predicted pose of the next moment, and obtain a posture after the end effector is adjusted;
位姿设置单元,用于将所述调节后的位姿设置为所述当前时刻的位姿,并由所述位姿判断单元执行生成检测所述当前时刻的位姿是否为预设的目标位姿的操作。a posture setting unit, configured to set the adjusted pose as a pose of the current moment, and perform, by the pose determination unit, to generate a target position for detecting whether the pose of the current moment is a preset target The operation of the posture.
另一方面,本发明还提供了一种机器人,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述机器人的模仿学习方法所述的步骤。In another aspect, the present invention also provides a robot comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor executing the computer program as The steps described above are mimicking the learning method of the robot.
另一方面,本发明还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述机器人的模仿学习方法所述的步骤。In another aspect, the present invention also provides a computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the steps as described above in the mimic learning method of the robot .
本发明预先构建了极限学习机模型,并推导出极限学习机模型的稳定性约束条件,由极限学习机模型结合稳定性约束条件训练得到动态预测模型,在接收到运动指令时,检测末端执行器当前时刻的位姿是否为目标位姿,是则确定末端执行器完成模仿学习任务,否则根据末端执行器当前时刻的位姿和动态预测模型,生成末端执行器下一时刻的预测位姿,根据该预测位姿调节末端执行器的关节,并跳转至检测末端执行器当前时刻的位姿是否为目标位姿的步骤,从而同时保证了机器人模仿学习的稳定性、复现精度以及模型训练速度,有效地提高了机器人运动的人性化程度。The invention constructs the extreme learning machine model in advance, and derives the stability constraint condition of the extreme learning machine model, and the dynamic learning model is obtained by the extreme learning machine model combined with the stability constraint condition training, and the end effector is detected when receiving the motion instruction. Whether the pose of the current moment is the target pose, it is determined that the end effector completes the imitation learning task, otherwise, according to the pose and dynamic prediction model of the current moment of the end effector, the predicted pose of the end effector is generated at the next moment, according to The predicted pose adjusts the joint of the end effector and jumps to the step of detecting whether the pose of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy, and the model training speed. , effectively improve the humanity of robot movement.
附图说明DRAWINGS
图1是本发明实施例一提供的机器人的模仿学习方法的实现流程图;1 is a flowchart of an implementation of an imitation learning method of a robot according to Embodiment 1 of the present invention;
图2是本发明实施例二提供的机器人的模仿学习方法中采集数据样本集和训练动态预测模型的实现流程图; 2 is a flowchart of an implementation of collecting a data sample set and a training dynamic prediction model in a simulated learning method of a robot according to Embodiment 2 of the present invention;
图3是本发明实施例三提供的机器人的模仿学习装置的结构示意图;3 is a schematic structural diagram of an imitation learning device of a robot according to Embodiment 3 of the present invention;
图4是本发明实施例四提供的机器人的模仿学习装置的结构示意图;以及4 is a schematic structural diagram of an imitation learning device of a robot according to Embodiment 4 of the present invention;
图5是本发明实施例五提供的机器人的结构示意图。FIG. 5 is a schematic structural diagram of a robot according to Embodiment 5 of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
以下结合具体实施例对本发明的具体实现进行详细描述:The specific implementation of the present invention is described in detail below in conjunction with specific embodiments:
实施例一:Embodiment 1:
图1示出了本发明实施例一提供的机器人的模仿学习方法的实现流程,为了便于说明,仅示出了与本发明实施例相关的部分,详述如下:FIG. 1 is a flowchart showing an implementation process of an imitation learning method for a robot according to Embodiment 1 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, which are described in detail as follows:
在步骤S101中,当接收到预设的运动指令时,获取末端执行器当前时刻的位姿。In step S101, when a preset motion instruction is received, the pose of the current moment of the end effector is acquired.
本发明实施例适用于但不限于带有关节、连杆等结构、可实现伸缩、抓取等动作的机器人。在接收到用户或者控制系统发送的运动或移动指令时,机器人可获取各个关节的关节角,再根据这些关节角和正运动学,计算得到末端执行器当前时刻的位姿,此外,若机器人自身带有末端执行器的位置传感器,可通过该位置传感器直接获得末端执行器当前时刻的位姿,其中,位姿包括末端执行器的位置和方向。The embodiments of the present invention are applicable to, but not limited to, a robot having a structure such as a joint or a connecting rod, which can realize an action of stretching, grasping, and the like. Upon receiving the motion or movement command sent by the user or the control system, the robot can acquire the joint angle of each joint, and then calculate the pose of the end effector according to the joint angle and the positive kinematics, and if the robot itself takes A position sensor having an end effector by which the position of the end effector at the current moment is directly obtained, wherein the pose includes the position and orientation of the end effector.
在步骤S102中,检测当前时刻的位姿是否为预设的目标位姿。In step S102, it is detected whether the pose of the current moment is a preset target pose.
在本发明实施例中,检测末端执行器当前时刻的位姿是否为预设的目标位姿,当末端执行器当前时刻的位姿为目标位姿时,执行步骤S106,否则执行步骤S103。In the embodiment of the present invention, it is detected whether the pose of the current moment of the end effector is a preset target pose, and when the pose of the current moment of the end effector is the target pose, step S106 is performed, otherwise step S103 is performed.
在步骤S103中,根据当前时刻的位姿和预先训练好的动态预测模型,生成末端执行器下一时刻的预测位姿,动态预测模型由预先构建的极限学习机模 型结合预设的稳定性约束条件训练得到。In step S103, based on the pose of the current time and the pre-trained dynamic prediction model, the predicted pose of the end effector at the next moment is generated, and the dynamic predictive model is constructed by a pre-built extreme learning model. The model is trained in conjunction with preset stability constraints.
在本发明实施例中,当末端执行器当前时刻的位姿不是预设的目标位姿时,需要对末端执行器的位姿进行调整。预先训练得到的动态预测模型用来根据末端执行器的当前状态预测出末端执行器当前状态的变化,因此在将末端执行器当前时刻的位姿输入动态预测模型后,可得到动态预测模型输出的末端执行器当前时刻的运动速度。根据末端执行器当前时刻的位姿和运动速度,可计算得到末端执行器下一时刻的预测位姿,计算公式可表示为:In the embodiment of the present invention, when the pose of the end effector at the current moment is not the preset target pose, the pose of the end effector needs to be adjusted. The pre-trained dynamic prediction model is used to predict the change of the current state of the end effector according to the current state of the end effector. Therefore, after inputting the pose of the current moment of the end effector into the dynamic prediction model, the dynamic prediction model output can be obtained. The speed at which the end effector is currently moving. According to the pose and motion speed of the end effector at the current moment, the predicted pose of the end effector at the next moment can be calculated. The calculation formula can be expressed as:
Figure PCTCN2017110923-appb-000001
其中,xt+1为末端执行器下一时刻t+1的预测位姿,xt为末端执行器当前时刻t的预测位姿,
Figure PCTCN2017110923-appb-000002
为动态预测模型的输出,δt为预设的采样时间间隔。
Figure PCTCN2017110923-appb-000001
Where x t+1 is the predicted pose of the next moment t+1 of the end effector, and x t is the predicted pose of the current moment t of the end effector,
Figure PCTCN2017110923-appb-000002
For the output of the dynamic prediction model, δt is the preset sampling time interval.
在本发明实施例中,预先在动态预测模型的训练过程中,构建极限学习机模型,并根据李雅普诺夫定理构建极限学习机模型对应的稳定性约束条件,结合该稳定性约束条件对极限学习机模型进行有监督的训练,训练好的极限学习机模型即训练好的动态预测模型,从而通过极限学习机与李雅普诺夫定理推导的稳定性约束条件的结合,有效地同时保证了机器人模仿学习的稳定性、复现精度和模型训练速度。In the embodiment of the present invention, the extreme learning machine model is constructed in advance in the training process of the dynamic prediction model, and the stability constraint condition corresponding to the extreme learning machine model is constructed according to the Lyapunov theorem, and the stability constraint condition is combined with the stability learning condition. The machine model is supervised and trained. The trained extreme learning machine model is the trained dynamic prediction model, which combines the stability constraints of the extreme learning machine and the Lyapunov theorem to effectively guarantee the robot simulation learning. Stability, reproducibility and model training speed.
其中,用于极限学习机模型训练的训练样本在用户的示教过程采集得到,训练样本的采集、动态预测模型的训练过程可参照实施例二中各步骤的详细描述,在此不再赘述。The training samples for the training of the extreme learning machine model are collected in the teaching process of the user. The training process of the training sample and the training process of the dynamic prediction model can be referred to the detailed description of each step in the second embodiment, and details are not described herein.
在步骤S104中,根据下一时刻的预测位姿,调节各个关节的关节角,获取末端执行器调节后的位姿。In step S104, the joint angle of each joint is adjusted based on the predicted pose of the next moment, and the pose after adjustment by the end effector is acquired.
在本发明实施例中,在获得末端执行器下一时刻的预测位姿后,可通过逆向运动学计算使得末端执行器从当前位姿运动到预测位姿,当前机器人各个关节分别需要变化的关节角,进而调节机器人各个关节的关节角,由于调节过程中的误差和精度,末端执行器调节后的位姿和预测位姿存在差异,可根据机器人调节后各个关节的角度,通过正运动学计算得到末端执行器调节后的位姿。 In the embodiment of the present invention, after obtaining the predicted pose of the end effector at the next moment, the end effector can be moved from the current pose to the predicted pose by the inverse kinematics calculation, and the joints of the current robot need to change respectively. The angle, and thus the joint angle of each joint of the robot, due to the error and precision in the adjustment process, the position and the predicted pose of the end effector are different, and can be calculated by the forward kinematics according to the angle of each joint after the robot adjusts. Get the position of the end effector adjusted.
在步骤S105中,将调节后的位姿设置为当前时刻的位姿。In step S105, the adjusted pose is set as the pose of the current time.
在本发明实施例中,将末端执行器调节后的位置设置为末端执行器当前时刻的位姿,并跳转至步骤S102中执行检测末端执行器当前时刻的位姿是否为预设的目标位姿操作,如此循环,直到末端执行器当前时刻的位姿与预设目标位姿相同。In the embodiment of the present invention, the position adjusted by the end effector is set to the posture of the current moment of the end effector, and jumps to whether the posture of detecting the current moment of the end effector is the preset target position in step S102. The posture operation, so looping, until the end effect of the end effector is the same as the preset target pose.
在步骤S106中,确定末端执行器完成预设的模仿学习任务。In step S106, it is determined that the end effector completes the preset imitation learning task.
在本发明实施例中,当末端执行器调节后的位姿为目标位姿时,可认为末端执行器成功模仿人的运动特性、收敛到了目标点,确定末端执行器完成了预设的模仿学习任务。In the embodiment of the present invention, when the positional posture adjusted by the end effector is the target pose, the end effector can be considered to successfully imitate the motion characteristics of the human, converge to the target point, and determine that the end effector completes the preset imitation learning. task.
在本发明实施例中,在末端执行器当前时刻的位姿不是目标位姿时,将当前时刻的位姿输入动态预测模型,得到末端执行器下一时刻的预测位姿,根据该预测位姿调节各个关节的角度,获得末端执行器调节后的位姿,继续判断末端执行器当前时刻的位姿是否为目标位姿,如此循环,直至末端执行器的位姿达到目标位姿,从而根据极限学习机模型和基于李雅普诺夫定理的稳定性约束条件的结合,同时保证了机器人模仿学习的稳定性、复现精度以及模型训练速度,有效地提高了机器人运动的人性化程度。In the embodiment of the present invention, when the pose of the current moment of the end effector is not the target pose, the pose of the current moment is input into the dynamic prediction model, and the predicted pose of the next moment of the end effector is obtained, according to the predicted pose. Adjust the angle of each joint, obtain the posture of the end effector adjustment, continue to determine whether the end position of the end effector is the target pose, and so on, until the end effector poses the target pose, thus according to the limit The combination of the learning machine model and the stability constraints based on Lyapunov's theorem ensures the stability of the robot's simulation learning, the reproducibility and the speed of the model training, and effectively improves the humanization of the robot movement.
实施例二:Embodiment 2:
图2示出了本发明实施例二提供的机器人的模仿学习方法中采集训练样本集和训练动态预测模型的实现流程,为了便于说明,仅示出了与本发明实施例相关的部分,详述如下:FIG. 2 is a flowchart showing an implementation process of collecting a training sample set and a training dynamic prediction model in a simulated learning method of a robot according to Embodiment 2 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown. as follows:
在步骤S201中,在示教过程中按照预设的采样时间间隔,在末端执行器的每条示教轨迹上采集末端执行器的位姿。In step S201, the pose of the end effector is acquired on each teaching track of the end effector according to a preset sampling time interval during teaching.
在本发明实施例中,在示教过程中可由示教者或者用户给出示教动作,末端执行器按照示教动作进行运动,通过机器人自身或者外部的动作捕捉器按照预设的采样时间间隔,在每条运动轨迹(示教轨迹)上采集末端执行器的位姿,采集到的末端执行器的位姿可表示为
Figure PCTCN2017110923-appb-000003
其中,i=1,...,Ntraj,k=1,...,Ni,Ntraj为 示教轨迹的数量,Ni为第i条示教轨迹上的采样点数目。
In the embodiment of the present invention, the teaching action may be given by the teacher or the user during the teaching process, and the end effector moves according to the teaching action, and the robot itself or the external motion capture device according to the preset sampling time interval. The pose of the end effector is acquired on each motion track (teaching track), and the position of the collected end effector can be expressed as
Figure PCTCN2017110923-appb-000003
Where i=1,...,N traj ,k=1,...,N i ,N traj is the number of teaching tracks, and N i is the number of sampling points on the i-th teaching track.
在本发明实施例中,对示教过程中的示教方式不做限制,作为示例地,示教者可通过遥控器或示教器对机器人进行操纵来给出示教动作,也可通过抓握住末端执行器在平面或空间中运动出一条轨迹来给出示教动作,还可通过戴着数据手套亲自完成运动任务由数据手套采集示教动作。In the embodiment of the present invention, the teaching mode in the teaching process is not limited. As an example, the teacher can manipulate the robot through the remote controller or the teach pendant to give a teaching action, or by grasping The end effector moves a trajectory in the plane or space to give a teaching action, and can also collect the teaching action by the data glove by wearing the data glove to complete the motion task.
在步骤S202中,根据采样时间间隔和末端执行器每个采样点处的位姿,计算末端执行器每个采样点处的速度,将末端执行器每个采样点处的位姿、速度组合构成训练样本集的训练样本。In step S202, according to the sampling time interval and the pose at each sampling point of the end effector, the speed at each sampling point of the end effector is calculated, and the pose and speed combination at each sampling point of the end effector are combined. Train the training samples of the sample set.
在本发明实施例中,在采样得到末端执行器每个采样点处的位姿后,可计算末端执行器每个采样点处的速度,作为示例地,末端执行器每个采样点处的速度的计算公式可表示为:In an embodiment of the invention, after sampling the pose at each sample point of the end effector, the velocity at each sample point of the end effector can be calculated, as an example, the speed at each sample point of the end effector The calculation formula can be expressed as:
Figure PCTCN2017110923-appb-000004
其中,δt为预设的采样时间间隔,
Figure PCTCN2017110923-appb-000005
Figure PCTCN2017110923-appb-000006
为末端执行器在第i条示教轨迹上第k个采样点处的速度。之后,将末端执行器每个采样点处的位姿、速度组合构成训练样本集的训练样本,训练样本可表示为
Figure PCTCN2017110923-appb-000007
Figure PCTCN2017110923-appb-000004
Where δt is the preset sampling interval,
Figure PCTCN2017110923-appb-000005
with
Figure PCTCN2017110923-appb-000006
The velocity at the kth sample point on the ith trace of the end effector. After that, the pose and velocity at each sampling point of the end effector are combined to form a training sample of the training sample set, and the training sample can be expressed as
Figure PCTCN2017110923-appb-000007
在步骤S203中,构建极限学习机模型,并根据在预设的示教过程中采集的训练样本集,初始化极限学习机模型的输入和目标输出。In step S203, an extreme learning machine model is constructed, and the input and target output of the extreme learning machine model are initialized according to the training sample set acquired during the preset teaching process.
在本发明实施例中,极限学习机模型是一种特殊的前向神经网络模型,其特殊之处在于仅含有一个隐藏层,且隐藏层的神经元个数、权值和偏置是随机确定的,在极限学习机模型训练的过程中,隐藏层的权值和偏置保持不变,仅修改输出层的权值,因此采用极限学习机模型作为机器人模仿学习的动态预测模型,在不需要大规模的训练数据的情况下就能够获得很好的训练效果,也便于为极限学习机模型添加稳定性约束。In the embodiment of the present invention, the extreme learning machine model is a special forward neural network model, which is special in that it only contains one hidden layer, and the number, weight and offset of neurons in the hidden layer are randomly determined. In the process of training the extreme learning machine model, the weight and offset of the hidden layer remain unchanged, and only the weight of the output layer is modified. Therefore, the extreme learning machine model is adopted as the dynamic prediction model of the robot simulation learning. In the case of large-scale training data, good training results can be obtained, and stability constraints can be added to the extreme learning machine model.
在本发明实施例中,构建极限学习机模型,极限学习机模型可表示为:In the embodiment of the present invention, an extreme learning machine model is constructed, and the extreme learning machine model can be expressed as:
Figure PCTCN2017110923-appb-000008
其中,
Figure PCTCN2017110923-appb-000009
Figure PCTCN2017110923-appb-000010
为极限学 习机模型中隐藏层的神经元个数、偏置和权重,β=(β1,…,βi,…,βN)为极限学习机网络模型中输出层的权重,x、g(x)分别为极限学习机模型的输入和激活函数,激活函数可为S形函数(sigmoid函数)或双曲正切函数(tanh函数),在此对激活函数不做限制。
Figure PCTCN2017110923-appb-000008
among them,
Figure PCTCN2017110923-appb-000009
with
Figure PCTCN2017110923-appb-000010
For the number of neurons, offsets, and weights of hidden layers in the ultimate learning model, β = (β 1 ,..., β i ,...,β N ) is the weight of the output layer in the network model of the extreme learning machine, x, g (x) is the input and activation function of the extreme learning machine model respectively. The activation function can be a sigmoid function (sigmoid function) or a hyperbolic tangent function (tanh function), and the activation function is not limited here.
另外地,极限学习机模型的输入层和输出层应该具有相同的维度,即具有相同的神经元个数d,若末端执行器在二维平面运动,d=2,若末端执行器在三维空间运动,d=3。In addition, the input layer and the output layer of the extreme learning machine model should have the same dimension, that is, have the same number of neurons d, if the end effector moves in a two-dimensional plane, d=2, if the end effector is in three-dimensional space Exercise, d=3.
在本发明实施例中,将训练样本集中训练样本中末端执行器的位姿设置为极限学习机模型的输入,将训练样本中末端执行器的速度设置为极限学习机模型的目标输出,从而可得到极限学习机模型的优化目标为:In the embodiment of the present invention, the pose of the end effector in the training sample set training sample is set as the input of the extreme learning machine model, and the speed of the end effector in the training sample is set as the target output of the extreme learning machine model, thereby The optimization goal of obtaining the extreme learning machine model is:
Figure PCTCN2017110923-appb-000011
其中,
Figure PCTCN2017110923-appb-000012
O为训练样本中末端执行器的速度,也为极限学习机模型的目标输出。
Figure PCTCN2017110923-appb-000011
among them,
Figure PCTCN2017110923-appb-000012
O is the speed of the end effector in the training sample and also the target output of the extreme learning machine model.
在步骤S204中,根据预设的李雅普诺夫定理,构建稳定性约束条件,稳定性约束条件包括全局渐近稳定的约束条件和局部渐近稳定的约束条件。In step S204, a stability constraint condition is constructed according to the preset Lyapunov theorem, and the stability constraint condition includes a global asymptotically stable constraint condition and a local asymptotically stable constraint condition.
在本发明实施例中,基于李雅普诺夫定理推导出适用于极限学习机模型的稳定性约束条件,稳定性约束条件通过对极限学习机模型中的权值进行条件约束,使得训练得到的极限学习机模型能够保证机器人模仿学习的稳定性。稳定性约束条件包括全局渐近稳定的约束条件和局部渐近稳定的约束条件,全局渐近稳定的约束条件可表示为:In the embodiment of the present invention, a stability constraint condition suitable for the extreme learning machine model is derived based on the Lyapunov theorem, and the stability constraint condition is restricted by the condition in the extreme learning machine model, so that the ultimate learning of the training is obtained. The machine model ensures that the robot mimics the stability of learning. Stability constraints include global asymptotically stable constraints and locally asymptotically stable constraints. Globally asymptotically stable constraints can be expressed as:
对于
Figure PCTCN2017110923-appb-000013
且Φi
Figure PCTCN2017110923-appb-000014
个特征值中存在d个特征值线性无关,其中,Φi
Figure PCTCN2017110923-appb-000015
的“对称部分”,
Figure PCTCN2017110923-appb-000016
<为矩阵的负定。局部渐近稳定的约束条件可表示为:
for
Figure PCTCN2017110923-appb-000013
And Φ i
Figure PCTCN2017110923-appb-000014
There are d eigenvalues linearly independent of eigenvalues, where Φ i is
Figure PCTCN2017110923-appb-000015
"symmetric part",
Figure PCTCN2017110923-appb-000016
< is the negative of the matrix. Locally asymptotically stable constraints can be expressed as:
对于
Figure PCTCN2017110923-appb-000017
for
Figure PCTCN2017110923-appb-000017
在步骤S205中,根据稳定性约束条件,对极限学习机模型进行有监督的 训练,将训练好的极限学习机模型设置为动态预测模型。In step S205, the extreme learning machine model is supervised according to stability constraints. Training, setting the trained extreme learning machine model as a dynamic prediction model.
在本发明实施例中,对极限学习机模型的优化目标
Figure PCTCN2017110923-appb-000018
进行优化,得到满足稳定性约束条件、且使得优化目标最优的一组输出层权值β。作为示例地,可通过最小二乘法对优化目标
Figure PCTCN2017110923-appb-000019
进行求解,得到
Figure PCTCN2017110923-appb-000020
再通过稳定性约束条件对
Figure PCTCN2017110923-appb-000021
进行约束,其中,H+是矩阵H的Moore-Penrose广义逆矩阵。最后,训练好的极限学习机模型即训练好的动态预测模型。
In the embodiment of the invention, the optimization goal of the extreme learning machine model
Figure PCTCN2017110923-appb-000018
Optimization is performed to obtain a set of output layer weights β that satisfy the stability constraints and optimize the optimization goals. As an example, the optimization target can be optimized by least squares
Figure PCTCN2017110923-appb-000019
Solve and get
Figure PCTCN2017110923-appb-000020
Stability constraint
Figure PCTCN2017110923-appb-000021
Constraints are made, where H + is the Moore-Penrose generalized inverse matrix of matrix H. Finally, the trained extreme learning machine model is the trained dynamic prediction model.
在本发明实施例中,构建极限学习机模型,基于李雅普诺夫定理推导出适用于极限学习机模型的稳定性约束条件,根据示教过程中采集的训练样本集、稳定性约束条件,对极限学习机模型进行训练,训练好的极限学习机模型即训练好的动态预测模型,从而有效地提高了机器人模仿学习的模型训练速度,同时保证了机器人模仿学习的稳定性和复现精度。In the embodiment of the present invention, an extreme learning machine model is constructed, and a stability constraint condition suitable for the extreme learning machine model is derived based on the Lyapunov theorem. According to the training sample set and stability constraint conditions collected during the teaching process, the limit is The learning machine model is trained, and the trained extreme learning machine model is the trained dynamic prediction model, which effectively improves the model training speed of the robot simulation learning, and at the same time ensures the stability and recurrence precision of the robot simulation learning.
实施例三:Embodiment 3:
图3示出了本发明实施例三提供的机器人的模仿学习装置的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:FIG. 3 is a diagram showing the structure of an imitation learning device for a robot according to Embodiment 3 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, including:
位姿获取单元31,用于当接收到预设的运动指令时,获取末端执行器当前时刻的位姿。The posture obtaining unit 31 is configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction.
在本发明实施例中,在接收到用户或者控制系统发送的运动或移动指令时,机器人可获取各个关节的关节角,再根据这些关节角和正运动学,计算得到末端执行器当前时刻的位姿,此外,若机器人自身带有末端执行器的位置传感器,可通过该位置传感器直接获得末端执行器当前时刻的位姿。In the embodiment of the present invention, when receiving the motion or movement instruction sent by the user or the control system, the robot can acquire the joint angle of each joint, and then calculate the posture of the current moment of the end effector according to the joint angle and the positive kinematics. In addition, if the robot itself has a position sensor of the end effector, the position sensor can directly obtain the position of the end effector at the current moment.
位姿判断单元32,用于检测当前时刻的位姿是否为预设的目标位姿,是则,确定末端执行器完成预设的模仿学习任务,否则,根据当前时刻的位姿和预先训练好的动态预测模型,生成末端执行器下一时刻的预测位姿,动态预测模型由预先构建的极限学习机模型结合预设的稳定性约束条件训练得到。The posture determining unit 32 is configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task; otherwise, according to the current posture and pre-training The dynamic prediction model generates the predicted pose of the end effector at the next moment. The dynamic predictive model is trained by the pre-built extreme learning machine model combined with the preset stability constraints.
在本发明实施例中,检测当前时刻的位姿是否为预设的目标位姿,是则,可认为末端执行器成功模仿人的运动特性、收敛到了目标点,确定末端执行器 完成了预设的模仿学习任务,否则需要对末端执行器的位姿进行调整,直至末端执行器的位姿为目标位姿。预先训练得到的动态预测模型用来根据末端执行器的当前状态预测出末端执行器当前状态的变化,因此在将末端执行器当前时刻的位姿输入动态预测模型后,可得到动态预测模型输出的末端执行器当前时刻的运动速度。根据末端执行器当前时刻的位姿和运动速度,可计算得到末端执行器下一时刻的预测位姿,计算公式可表示为:In the embodiment of the present invention, it is detected whether the pose of the current moment is a preset target pose, and yes, it can be considered that the end effector successfully imitates the motion characteristics of the human, converges to the target point, and determines the end effector. The preset imitation learning task is completed, otherwise the posture of the end effector needs to be adjusted until the end effector's pose is the target pose. The pre-trained dynamic prediction model is used to predict the change of the current state of the end effector according to the current state of the end effector. Therefore, after inputting the pose of the current moment of the end effector into the dynamic prediction model, the dynamic prediction model output can be obtained. The speed at which the end effector is currently moving. According to the pose and motion speed of the end effector at the current moment, the predicted pose of the end effector at the next moment can be calculated. The calculation formula can be expressed as:
Figure PCTCN2017110923-appb-000022
其中,xt+1为末端执行器下一时刻t+1的预测位姿,xt为末端执行器当前时刻t的预测位姿,
Figure PCTCN2017110923-appb-000023
为动态预测模型的输出,δt为预设的采样时间间隔。
Figure PCTCN2017110923-appb-000022
Where x t+1 is the predicted pose of the next moment t+1 of the end effector, and x t is the predicted pose of the current moment t of the end effector,
Figure PCTCN2017110923-appb-000023
For the output of the dynamic prediction model, δt is the preset sampling time interval.
在本发明实施例中,预先在动态预测模型的训练过程中,构建极限学习机模型,并根据李雅普诺夫定理构建极限学习机模型对应的稳定性约束条件,结合该稳定性约束条件对极限学习机模型进行有监督的训练,训练好的极限学习机模型即训练好的动态预测模型,从而通过极限学习机与李雅普诺夫定理推导的稳定性约束条件的结合,有效地同时保证了机器人模仿学习的稳定性、复现精度和模型训练速度。In the embodiment of the present invention, the extreme learning machine model is constructed in advance in the training process of the dynamic prediction model, and the stability constraint condition corresponding to the extreme learning machine model is constructed according to the Lyapunov theorem, and the stability constraint condition is combined with the stability learning condition. The machine model is supervised and trained. The trained extreme learning machine model is the trained dynamic prediction model, which combines the stability constraints of the extreme learning machine and the Lyapunov theorem to effectively guarantee the robot simulation learning. Stability, reproducibility and model training speed.
其中,用于极限学习机模型训练的训练样本在用户的示教过程采集得到,训练样本的采集、动态预测模型的训练过程可参照实施例四中相应单元的详细描述,在此不再赘述。The training samples for the training of the extreme learning machine model are collected in the teaching process of the user. The training process of the training sample and the training process of the dynamic prediction model can be referred to the detailed description of the corresponding unit in the fourth embodiment, and details are not described herein.
运动调节单元33,用于根据下一时刻的预测位姿,调节各个关节的关节角,获取末端执行器调节后的位姿。The motion adjusting unit 33 is configured to adjust the joint angle of each joint according to the predicted posture of the next moment, and obtain the posture after the end effector adjustment.
在本发明实施例中,在获得末端执行器下一时刻的预测位姿后,可通过逆向运动学计算使得末端执行器从当前位姿运动到预测位姿,当前机器人各个关节分别需要变化的关节角,进而调节机器人各个关节的关节角,由于调节过程中的误差和精度,末端执行器调节后的位姿和预测位姿存在差异,可根据机器人调节后各个关节的角度,通过正运动学计算得到末端执行器调节后的位姿。In the embodiment of the present invention, after obtaining the predicted pose of the end effector at the next moment, the end effector can be moved from the current pose to the predicted pose by the inverse kinematics calculation, and the joints of the current robot need to change respectively. The angle, and thus the joint angle of each joint of the robot, due to the error and precision in the adjustment process, the position and the predicted pose of the end effector are different, and can be calculated by the forward kinematics according to the angle of each joint after the robot adjusts. Get the position of the end effector adjusted.
位姿设置单元34,用于将调节后的位姿设置为当前时刻的位姿,并由位姿 判断单元32执行检测当前时刻的位姿是否为预设的目标位姿的操作。a posture setting unit 34, configured to set the adjusted pose as the pose of the current moment, and to be posed The judging unit 32 performs an operation of detecting whether the pose of the current time is the preset target pose.
在本发明实施例中,将末端执行器调节后的位置设置为末端执行器当前时刻的位姿,并由位姿判断单元32执行检测末端执行器当前时刻的位姿是否为预设的目标位姿操作,如此循环,直到末端执行器当前时刻的位姿与预设目标位姿相同。In the embodiment of the present invention, the position adjusted by the end effector is set to the posture of the end effector at the current time, and the posture determining unit 32 performs a detection of whether the posture of the current moment of the end effector is the preset target position. The posture operation, so looping, until the end effect of the end effector is the same as the preset target pose.
在本发明实施例中,在末端执行器当前时刻的位姿不是目标位姿时,将当前时刻的位姿输入动态预测模型,得到末端执行器下一时刻的预测位姿,根据该预测位姿调节各个关节的角度,获得末端执行器调节后的位姿,继续判断末端执行器当前时刻的位姿是否为目标位姿,如此循环,直至末端执行器的位姿达到目标位姿,从而根据极限学习机模型和基于李雅普诺夫定理的稳定性约束条件的结合,同时保证了机器人模仿学习的稳定性、复现精度以及模型训练速度,有效地提高了机器人运动的人性化程度。In the embodiment of the present invention, when the pose of the current moment of the end effector is not the target pose, the pose of the current moment is input into the dynamic prediction model, and the predicted pose of the next moment of the end effector is obtained, according to the predicted pose. Adjust the angle of each joint, obtain the posture of the end effector adjustment, continue to determine whether the end position of the end effector is the target pose, and so on, until the end effector poses the target pose, thus according to the limit The combination of the learning machine model and the stability constraints based on Lyapunov's theorem ensures the stability of the robot's simulation learning, the reproducibility and the speed of the model training, and effectively improves the humanization of the robot movement.
实施例四:Embodiment 4:
图4示出了本发明实施例四提供的机器人的模仿学习装置的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:FIG. 4 is a diagram showing the structure of an imitation learning device for a robot according to Embodiment 4 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, including:
示教采集单元41,用于在示教过程中按照预设的采样时间间隔,在末端执行器的每条示教轨迹上采集末端执行器的位姿。The teaching acquisition unit 41 is configured to collect the pose of the end effector on each teaching track of the end effector according to a preset sampling time interval during teaching.
在本发明实施例中,在示教过程中可由示教者或者用户给出示教动作,末端执行器按照示教动作进行运动,通过机器人自身或者外部的动作捕捉器按照预设的采样时间间隔,在每条运动轨迹(示教轨迹)上采集末端执行器的位姿,采集到的末端执行器的位姿可表示为
Figure PCTCN2017110923-appb-000024
其中,i=1,...,Ntraj,k=1,...,Ni,Ntraj为示教轨迹的数量,Ni为第i条示教轨迹上的采样点数目。在此对示教过程中的示教方式不做限制。
In the embodiment of the present invention, the teaching action may be given by the teacher or the user during the teaching process, and the end effector moves according to the teaching action, and the robot itself or the external motion capture device according to the preset sampling time interval. The pose of the end effector is acquired on each motion track (teaching track), and the position of the collected end effector can be expressed as
Figure PCTCN2017110923-appb-000024
Where i=1,...,N traj ,k=1,...,N i ,N traj is the number of teaching tracks, and N i is the number of sampling points on the i-th teaching track. There is no restriction on the teaching method in the teaching process.
样本生成单元42,用于根据采样时间间隔和末端执行器每个采样点处的位姿,计算末端执行器每个采样点处的速度,将末端执行器每个采样点处的位姿、速度组合构成训练样本集的训练样本。 The sample generating unit 42 is configured to calculate the velocity at each sampling point of the end effector according to the sampling time interval and the pose at each sampling point of the end effector, and the pose and speed at each sampling point of the end effector Combine the training samples that make up the training sample set.
在本发明实施例中,在采样得到末端执行器每个采样点处的位姿后,可计算末端执行器每个采样点处的速度,作为示例地,末端执行器每个采样点处的速度的计算公式可表示为:In an embodiment of the invention, after sampling the pose at each sample point of the end effector, the velocity at each sample point of the end effector can be calculated, as an example, the speed at each sample point of the end effector The calculation formula can be expressed as:
Figure PCTCN2017110923-appb-000025
其中,δt为预设的采样时间间隔,
Figure PCTCN2017110923-appb-000026
Figure PCTCN2017110923-appb-000027
为末端执行器在第i条示教轨迹上第k个采样点处的速度。之后,将末端执行器每个采样点处的位姿、速度组合构成训练样本集的训练样本,训练样本可表示为
Figure PCTCN2017110923-appb-000028
Figure PCTCN2017110923-appb-000025
Where δt is the preset sampling interval,
Figure PCTCN2017110923-appb-000026
with
Figure PCTCN2017110923-appb-000027
The velocity at the kth sample point on the ith trace of the end effector. After that, the pose and velocity at each sampling point of the end effector are combined to form a training sample of the training sample set, and the training sample can be expressed as
Figure PCTCN2017110923-appb-000028
模型构建单元43,用于构建极限学习机模型,并根据在预设的示教过程中采集的训练样本集,初始化极限学习机模型的输入和目标输出。The model construction unit 43 is configured to construct an extreme learning machine model, and initialize an input and a target output of the extreme learning machine model according to the training sample set collected in the preset teaching process.
在本发明实施例中,构建极限学习机模型,极限学习机模型可表示为:In the embodiment of the present invention, an extreme learning machine model is constructed, and the extreme learning machine model can be expressed as:
Figure PCTCN2017110923-appb-000029
其中,
Figure PCTCN2017110923-appb-000030
Figure PCTCN2017110923-appb-000031
为极限学习机模型中隐藏层的神经元个数、偏置和权重,β=(β1,…,βi,…,βN)为极限学习机网络模型中输出层的权重,x、g(x)分别为极限学习机模型的输入和激活函数,在此对激活函数不做限制。
Figure PCTCN2017110923-appb-000029
among them,
Figure PCTCN2017110923-appb-000030
with
Figure PCTCN2017110923-appb-000031
For the number of neurons, offsets and weights of the hidden layer in the extreme learning machine model, β=(β 1 ,...,β i ,...,β N ) is the weight of the output layer in the network model of the extreme learning machine, x, g (x) The input and activation functions of the extreme learning machine model, respectively, where the activation function is not limited.
另外地,极限学习机模型的输入层和输出层应该具有相同的维度,即具有15相同的神经元个数d,若末端执行器在二维平面运动,d=2,若末端执行器在三维空间运动,d=3。In addition, the input layer and the output layer of the extreme learning machine model should have the same dimension, that is, have the same number of neurons d of 15, if the end effector moves in a two-dimensional plane, d=2, if the end effector is in three dimensions Space motion, d=3.
在本发明实施例中,将训练样本集中训练样本中末端执行器的位姿设置为极限学习机模型的输入,将训练样本中末端执行器的速度设置为极限学习机模型的目标输出,从而可得到极限学习机模型的优化目标为:In the embodiment of the present invention, the pose of the end effector in the training sample set training sample is set as the input of the extreme learning machine model, and the speed of the end effector in the training sample is set as the target output of the extreme learning machine model, thereby The optimization goal of obtaining the extreme learning machine model is:
Figure PCTCN2017110923-appb-000032
其中,
Figure PCTCN2017110923-appb-000033
O为训练样本中末端执行器的速度,也为极限学习机模型的目标输出。
Figure PCTCN2017110923-appb-000032
among them,
Figure PCTCN2017110923-appb-000033
O is the speed of the end effector in the training sample and also the target output of the extreme learning machine model.
约束构建单元44,用于根据预设的李雅普诺夫定理,构建稳定性约束条件,稳定性约束条件包括全局渐近稳定的约束条件和局部渐近稳定的约束条件。The constraint construction unit 44 is configured to construct a stability constraint according to the preset Lyapunov theorem, and the stability constraint includes a global asymptotically stable constraint condition and a local asymptotically stable constraint condition.
在本发明实施例中,基于李雅普诺夫定理推导出适用于极限学习机模型的 稳定性约束条件,稳定性约束条件通过对极限学习机模型中的权值进行条件约束,使得训练得到的极限学习机模型能够保证机器人模仿学习的稳定性。稳定性约束条件包括全局渐近稳定的约束条件和局部渐近稳定的约束条件,全局渐近稳定的约束条件可表示为:In the embodiment of the present invention, based on the Lyapunov theorem, the model suitable for the extreme learning machine model is derived. Stability constraints, stability constraints By conditionally constraining the weights in the extreme learning machine model, the trained extreme learning machine model can guarantee the stability of robot simulation learning. Stability constraints include global asymptotically stable constraints and locally asymptotically stable constraints. Globally asymptotically stable constraints can be expressed as:
对于
Figure PCTCN2017110923-appb-000034
且Φi
Figure PCTCN2017110923-appb-000035
个特征值中存在d个特征值线性无关,其中,Φi
Figure PCTCN2017110923-appb-000036
的“对称部分”,
Figure PCTCN2017110923-appb-000037
<为矩阵的负定。局部渐近稳定的约束条件可表示为:
for
Figure PCTCN2017110923-appb-000034
And Φ i
Figure PCTCN2017110923-appb-000035
There are d eigenvalues linearly independent of eigenvalues, where Φ i is
Figure PCTCN2017110923-appb-000036
"symmetric part",
Figure PCTCN2017110923-appb-000037
< is the negative of the matrix. Locally asymptotically stable constraints can be expressed as:
对于
Figure PCTCN2017110923-appb-000038
for
Figure PCTCN2017110923-appb-000038
模型训练单元45,用于根据稳定性约束条件,对极限学习机模型进行有监督的训练,将训练好的极限学习机模型设置为动态预测模型。The model training unit 45 is configured to perform supervised training on the extreme learning machine model according to the stability constraint condition, and set the trained extreme learning machine model as a dynamic prediction model.
在本发明实施例中,对极限学习机模型的优化目标
Figure PCTCN2017110923-appb-000039
进行优化,得到满足稳定性约束条件、且使得优化目标最优的一组输出层权值β。作为示例地,可通过最小二乘法对优化目标
Figure PCTCN2017110923-appb-000040
进行求解,得到
Figure PCTCN2017110923-appb-000041
再通过稳定性约束条件对
Figure PCTCN2017110923-appb-000042
进行约束,其中,H+是矩阵H的Moore-Penrose广义逆矩阵。最后,训练好的极限学习机模型即训练好的动态预测模型。
In the embodiment of the invention, the optimization goal of the extreme learning machine model
Figure PCTCN2017110923-appb-000039
Optimization is performed to obtain a set of output layer weights β that satisfy the stability constraints and optimize the optimization goals. As an example, the optimization target can be optimized by least squares
Figure PCTCN2017110923-appb-000040
Solve and get
Figure PCTCN2017110923-appb-000041
Stability constraint
Figure PCTCN2017110923-appb-000042
Constraints are made, where H + is the Moore-Penrose generalized inverse matrix of matrix H. Finally, the trained extreme learning machine model is the trained dynamic prediction model.
位姿获取单元46,用于当接收到预设的运动指令时,获取末端执行器当前时刻的位姿。The posture obtaining unit 46 is configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction.
在本发明实施例中,在接收到用户或者控制系统发送的运动或移动指令时,机器人可获取各个关节的关节角,再根据这些关节角和正运动学,计算得到末端执行器当前时刻的位姿,此外,若机器人自身带有末端执行器的位置传感器,可通过该位置传感器直接获得末端执行器当前时刻的位姿。In the embodiment of the present invention, when receiving the motion or movement instruction sent by the user or the control system, the robot can acquire the joint angle of each joint, and then calculate the posture of the current moment of the end effector according to the joint angle and the positive kinematics. In addition, if the robot itself has a position sensor of the end effector, the position sensor can directly obtain the position of the end effector at the current moment.
位姿判断单元47,用于检测当前时刻的位姿是否为预设的目标位姿,是则,确定末端执行器完成预设的模仿学习任务,否则,根据当前时刻的位姿和预先训练好的动态预测模型,生成末端执行器下一时刻的预测位姿,动态预测模型 由预先构建的极限学习机模型结合预设的稳定性约束条件训练得到。The posture determining unit 47 is configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task; otherwise, according to the current posture and pre-training Dynamic prediction model, generating the predicted pose of the end effector at the next moment, dynamic prediction model It is trained by a pre-built extreme learning machine model in combination with preset stability constraints.
在本发明实施例中,检测当前时刻的位姿是否为预设的目标位姿,是则,可认为末端执行器成功模仿人的运动特性、收敛到了目标点,确定末端执行器完成了预设的模仿学习任务,否则需要对末端执行器的位姿进行调整,直至末端执行器的位姿为目标位姿。In the embodiment of the present invention, it is determined whether the pose of the current moment is a preset target pose, and yes, it can be considered that the end effector successfully imitates the motion characteristics of the human, converges to the target point, and determines that the end effector completes the preset. Imitate the learning task, otherwise the orientation of the end effector needs to be adjusted until the end effector's pose is the target pose.
在本发明实施例中,当末端执行器当前时刻的位姿不是目标位姿时,将末端执行器当前时刻的位姿输入动态预测模型,可得到动态预测模型输出的末端执行器当前时刻的运动速度。根据末端执行器当前时刻的位姿和运动速度,可计算得到末端执行器下一时刻的预测位姿,计算公式可表示为:In the embodiment of the present invention, when the pose of the current moment of the end effector is not the target pose, the pose of the current moment of the end effector is input into the dynamic prediction model, and the motion of the current moment of the end effector output by the dynamic predictive model is obtained. speed. According to the pose and motion speed of the end effector at the current moment, the predicted pose of the end effector at the next moment can be calculated. The calculation formula can be expressed as:
Figure PCTCN2017110923-appb-000043
其中,xt+1为末端执行器下一时刻t+1的预测位姿,xt为末端执行器当前时刻t的预测位姿,
Figure PCTCN2017110923-appb-000044
为动态预测模型的输出,δt为预设的采样时间间隔。
Figure PCTCN2017110923-appb-000043
Where x t+1 is the predicted pose of the next moment t+1 of the end effector, and x t is the predicted pose of the current moment t of the end effector,
Figure PCTCN2017110923-appb-000044
For the output of the dynamic prediction model, δt is the preset sampling time interval.
运动调节单元48,用于根据下一时刻的预测位姿,调节各个关节的关节角,获取末端执行器调节后的位姿。The motion adjusting unit 48 is configured to adjust the joint angle of each joint according to the predicted posture of the next moment, and obtain the posture after the end effector adjustment.
在本发明实施例中,在获得末端执行器下一时刻的预测位姿后,可通过逆向运动学计算使得末端执行器从当前位姿运动到预测位姿,当前机器人各个关节分别需要变化的关节角,进而调节机器人各个关节的关节角,由于调节过程中的误差和精度,末端执行器调节后的位姿和预测位姿存在差异,可根据机器人调节后各个关节的角度,通过正运动学计算得到末端执行器调节后的位姿。In the embodiment of the present invention, after obtaining the predicted pose of the end effector at the next moment, the end effector can be moved from the current pose to the predicted pose by the inverse kinematics calculation, and the joints of the current robot need to change respectively. The angle, and thus the joint angle of each joint of the robot, due to the error and precision in the adjustment process, the position and the predicted pose of the end effector are different, and can be calculated by the forward kinematics according to the angle of each joint after the robot adjusts. Get the position of the end effector adjusted.
位姿设置单元49,用于将调节后的位姿设置为当前时刻的位姿,并由位姿判断单元47执行检测当前时刻的位姿是否为预设的目标位姿的操作。The pose setting unit 49 is configured to set the adjusted pose as the pose of the current moment, and the pose determination unit 47 performs an operation of detecting whether the pose of the current moment is the preset target pose.
在本发明实施例中,预先根据极限学习机模型、基于李雅普诺夫定理的稳定性约束条件,训练得到动态预测模型,在获得末端执行器当前时刻的位姿时,通过动态预设模型对末端执行器的位姿进行调节,直至末端执行器当前时刻的位姿为目标位姿,从而同时保证了机器人模仿学习的稳定性、复现精度以及模型训练速度,有效地提高了机器人运动的人性化程度。 In the embodiment of the present invention, the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem. When the pose of the end effector is obtained at the current moment, the end is dynamically preset by the model. The posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy and the model training speed, effectively improving the humanization of the robot movement. degree.
在本发明实施例中,机器人的模仿学习装置的各单元可由相应的硬件或软件单元实现,各单元可以为独立的软、硬件单元,也可以集成为一个软、硬件单元,在此不用以限制本发明。In the embodiment of the present invention, each unit of the mimicry learning device of the robot may be implemented by a corresponding hardware or software unit, and each unit may be an independent software and hardware unit, or may be integrated into one soft and hardware unit, and there is no need to limit this invention.
实施例五:Embodiment 5:
图5示出了本发明实施例五提供的机器人的结构,为了便于说明,仅示出了与本发明实施例相关的部分。Fig. 5 shows the structure of a robot provided in Embodiment 5 of the present invention, and for convenience of explanation, only parts related to the embodiment of the present invention are shown.
本发明实施例的机器人5包括处理器50、存储器51以及存储在存储器51中并可在处理器50上运行的计算机程序52。该处理器50执行计算机程序52时实现上述各个方法实施例中的步骤,例如图1所示的步骤S101至S106。或者,处理器50执行计算机程序52时实现上述各装置实施例中各单元的功能,例如图3所示单元31至34的功能。The robot 5 of the embodiment of the present invention includes a processor 50, a memory 51, and a computer program 52 stored in the memory 51 and operable on the processor 50. The processor 50, when executing the computer program 52, implements the steps in the various method embodiments described above, such as steps S101 through S106 shown in FIG. Alternatively, processor 50, when executing computer program 52, implements the functions of the various units of the various apparatus embodiments described above, such as the functions of units 31 through 34 shown in FIG.
在本发明实施例中,预先根据极限学习机模型、基于李雅普诺夫定理的稳定性约束条件,训练得到动态预测模型,在获得末端执行器当前时刻的位姿时,通过动态预设模型对末端执行器的位姿进行调节,直至末端执行器当前时刻的位姿为目标位姿,从而同时保证了机器人模仿学习的稳定性、复现精度以及模型训练速度,有效地提高了机器人运动的人性化程度。In the embodiment of the present invention, the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem. When the pose of the end effector is obtained at the current moment, the end is dynamically preset by the model. The posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability of the robot simulation learning, the reproduction accuracy and the model training speed, effectively improving the humanization of the robot movement. degree.
实施例六:Example 6:
在本发明实施例中,提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述各个方法实施例中的步骤,例如,图1所示的步骤S101至S106。或者,该计算机程序被处理器执行时实现上述各装置实施例中各单元的功能,例如图3所示单元31至34的功能。In an embodiment of the present invention, there is provided a computer readable storage medium storing a computer program, the computer program being executed by a processor to implement the steps in the foregoing method embodiments, for example, FIG. Steps S101 to S106 are shown. Alternatively, the computer program, when executed by the processor, implements the functions of the various units in the various apparatus embodiments described above, such as the functions of units 31 through 34 shown in FIG.
在本发明实施例中,预先根据极限学习机模型、基于李雅普诺夫定理的稳定性约束条件,训练得到动态预测模型,在获得末端执行器当前时刻的位姿时,通过动态预设模型对末端执行器的位姿进行调节,直至末端执行器当前时刻的位姿为目标位姿,从而同时保证了机器人模仿学习的稳定性、复现精度以及模 型训练速度,有效地提高了机器人运动的人性化程度。In the embodiment of the present invention, the dynamic prediction model is trained according to the extreme learning machine model and the stability constraint condition based on the Lyapunov theorem. When the pose of the end effector is obtained at the current moment, the end is dynamically preset by the model. The posture of the actuator is adjusted until the position of the end effector at the current moment is the target pose, thereby ensuring the stability, reproducibility and mode of the robot simulation learning. The training speed effectively improves the humanization of robot movement.
本发明实施例的计算机可读存储介质可以包括能够携带计算机程序代码的任何实体或装置、记录介质,例如,ROM/RAM、磁盘、光盘、闪存等存储器。The computer readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program code, a recording medium such as a ROM/RAM, a magnetic disk, an optical disk, a flash memory, or the like.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。 The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. Within the scope.

Claims (10)

  1. 一种机器人的模仿学习方法,其特征在于,所述方法包括下述步骤:A method for simulating learning of a robot, characterized in that the method comprises the following steps:
    当接收到预设的运动指令时,获取末端执行器当前时刻的位姿;Obtaining the pose of the current moment of the end effector when receiving the preset motion instruction;
    检测所述当前时刻的位姿是否为预设的目标位姿,是则,确定所述末端执行器完成预设的模仿学习任务,否则,根据所述当前时刻的位姿和预先训练好的动态预测模型,生成所述末端执行器下一时刻的预测位姿,所述动态预测模型由预先构建的极限学习机模型结合预设的稳定性约束条件训练得到;Detecting whether the pose of the current moment is a preset target pose, and determining that the end effector completes the preset imitation learning task, otherwise, according to the pose of the current moment and the pre-trained dynamic Predicting a model, generating a predicted pose of the end effector at a next moment, the dynamic predictive model being trained by a pre-built extreme learning machine model in combination with a preset stability constraint;
    根据所述下一时刻的预测位姿,调节各个关节的关节角,获取所述末端执行器调节后的位姿;Adjusting a joint angle of each joint according to the predicted pose of the next moment, and obtaining a posture after the end effector is adjusted;
    将所述调节后的位姿设置为所述当前时刻的位姿,并跳转至检测所述当前时刻的位姿是否为预设的目标位姿的步骤。And setting the adjusted pose to the pose of the current moment, and jumping to a step of detecting whether the pose of the current moment is a preset target pose.
  2. 如权利要求1所述的方法,其特征在于,当接收到预设的运动指令时,获取末端执行器当前时刻的位姿步骤之前,所述方法还包括:The method of claim 1, wherein the method further comprises: before the step of acquiring the current position of the end effector, when the preset motion command is received, the method further comprising:
    构建所述极限学习机模型,并根据在预设的示教过程中采集的训练样本集,初始化所述极限学习机模型的输入和目标输出;Constructing the extreme learning machine model, and initializing the input and target output of the extreme learning machine model according to the training sample set collected in the preset teaching process;
    根据预设的李雅普诺夫定理,构建所述稳定性约束条件,所述稳定性约束条件包括全局渐近稳定的约束条件和局部渐近稳定的约束条件;The stability constraint is constructed according to a preset Lyapunov theorem, and the stability constraint includes a global asymptotically stable constraint and a locally asymptotically stable constraint;
    根据所述稳定性约束条件,对所述极限学习机模型进行有监督的训练,将训练好的所述极限学习机模型设置为所述动态预测模型。According to the stability constraint, the extreme learning machine model is supervised and the trained extreme learning machine model is set as the dynamic prediction model.
  3. 如权利要求2所述的方法,其特征在于,构建所述极限学习机模型的步骤之前,所述方法还包括:The method of claim 2, wherein before the step of constructing the extreme learning machine model, the method further comprises:
    在所述示教过程中按照预设的采样时间间隔,在所述末端执行器的每条示教轨迹上采集所述末端执行器的位姿;Collecting a pose of the end effector on each teaching track of the end effector according to a preset sampling time interval during the teaching process;
    根据所述采样时间间隔和所述末端执行器每个采样点处的位姿,计算所述末端执行器每个采样点处的速度,将所述末端执行器每个采样点处的位姿、速度组合构成所述训练样本集的训练样本。 Calculating a velocity at each sampling point of the end effector according to the sampling time interval and a pose at each sampling point of the end effector, and placing a pose at each sampling point of the end effector, The speed combination constitutes a training sample of the training sample set.
  4. 如权利要求3所述的方法,其特征在于,构建所述极限学习机模型,并根据在预设的示教过程中采集的训练样本集,初始化所述极限学习机模型的输入和目标输出的步骤,包括:The method of claim 3, wherein the extreme learning machine model is constructed, and the input and target output of the extreme learning machine model are initialized according to a training sample set acquired during a preset teaching process Steps, including:
    构建所述极限学习机模型,所述极限学习机模型表示为:Constructing the extreme learning machine model, the extreme learning machine model expressed as:
    Figure PCTCN2017110923-appb-100001
    其中,所述
    Figure PCTCN2017110923-appb-100002
    Figure PCTCN2017110923-appb-100003
    分别为所述极限学习机模型中隐藏层的神经元个数、偏置和权重,所述β=(β1,…,βN)为所述极限学习机网络模型中输出层的权重,所述x、g(x)分别为所述极限学习机模型的输入和激活函数;
    Figure PCTCN2017110923-appb-100001
    Wherein said
    Figure PCTCN2017110923-appb-100002
    with
    Figure PCTCN2017110923-appb-100003
    Respectively, the number, offset, and weight of neurons in the hidden layer in the extreme learning machine model, wherein β=(β 1 , . . . , β N ) is the weight of the output layer in the network model of the extreme learning machine. The x and g(x) are respectively the input and activation functions of the extreme learning machine model;
    将所述训练样本集的训练样本中所述末端执行器的位姿和所述末端执行器的速度分别设置为所述极限学习机模型的输入和目标输出,以获得所述极限学习机模型的优化目标,所述优化目标表示为:Setting a pose of the end effector and a speed of the end effector in a training sample of the training sample set to an input and a target output of the extreme learning machine model, respectively, to obtain the extreme learning machine model Optimization goal, expressed as:
    Figure PCTCN2017110923-appb-100004
    其中,所述
    Figure PCTCN2017110923-appb-100005
    所述O为所述训练样本集的训练样本中所述末端执行器的速度,也为所述极限学习机模型的目标输出。
    Figure PCTCN2017110923-appb-100004
    Wherein said
    Figure PCTCN2017110923-appb-100005
    The O is the velocity of the end effector in the training sample of the training sample set, and is also the target output of the extreme learning machine model.
  5. 如权利要求2所述的方法,其特征在于,根据预设的李雅普诺夫定理,构建所述稳定性约束条件的步骤,包括:The method of claim 2, wherein the step of constructing the stability constraint according to a preset Lyapunov theorem comprises:
    根据所述李雅普诺夫定理,构建所述全局渐近稳定的约束条件,所述全局渐近稳定的约束条件为:Constructing the global asymptotically stable constraint according to the Lyapunov theorem, the global asymptotically stable constraint condition is:
    对于
    Figure PCTCN2017110923-appb-100006
    且Φi的所述
    Figure PCTCN2017110923-appb-100007
    个特征值中存在d个特征值线性无关,其中,所述
    Figure PCTCN2017110923-appb-100008
    for
    Figure PCTCN2017110923-appb-100006
    And Φ i
    Figure PCTCN2017110923-appb-100007
    There are d eigenvalues linearly irrelevant among the eigenvalues, wherein
    Figure PCTCN2017110923-appb-100008
    根据所述李雅普诺夫定理,构建所述局部渐近稳定的约束条件,所述局部渐近稳定的约束条件为:Constructing the locally asymptotically stable constraint according to the Lyapunov theorem, the local asymptotically stable constraint condition is:
    对于
    Figure PCTCN2017110923-appb-100009
    for
    Figure PCTCN2017110923-appb-100009
  6. 一种机器人的模仿学习装置,其特征在于,所述装置包括: An imitation learning device for a robot, characterized in that the device comprises:
    位姿获取单元,用于当接收到预设的运动指令时,获取末端执行器当前时刻的位姿;a posture acquiring unit, configured to acquire a pose of the current moment of the end effector when receiving the preset motion instruction;
    位姿判断单元,用于检测所述当前时刻的位姿是否为预设的目标位姿,是则,确定所述末端执行器完成预设的模仿学习任务,否则,根据所述当前时刻的位姿和预先训练好的动态预测模型,生成所述末端执行器下一时刻的预测位姿,所述动态预测模型由预先构建的极限学习机模型结合预设的稳定性约束条件训练得到;a posture determining unit, configured to detect whether the pose of the current moment is a preset target pose, and if yes, determine that the end effector completes the preset imitation learning task, otherwise, according to the current moment And a pre-trained dynamic prediction model, generating a predicted pose of the end effector at a next moment, the dynamic predictive model being trained by a pre-built extreme learning machine model in combination with a preset stability constraint;
    运动调节单元,用于根据所述下一时刻的预测位姿,调节各个关节的关节角,获取所述末端执行器调节后的位姿;以及a motion adjustment unit, configured to adjust a joint angle of each joint according to the predicted pose of the next moment, and obtain a posture after the end effector is adjusted;
    位姿设置单元,用于将所述调节后的位姿设置为所述当前时刻的位姿,并由所述位姿判断单元执行检测所述当前时刻的位姿是否为预设的目标位姿的操作。a posture setting unit, configured to set the adjusted pose as a pose of the current moment, and perform, by the pose determination unit, whether the pose of the current moment is a preset target pose Operation.
  7. 如权利要求6所述的装置,其特征在于,所述装置还包括:The device of claim 6 wherein said device further comprises:
    模型构建单元,用于构建所述极限学习机模型,并根据在预设的示教过程中采集的训练样本集,初始化所述极限学习机模型的输入和目标输出;a model building unit, configured to construct the extreme learning machine model, and initialize an input and a target output of the extreme learning machine model according to a training sample set collected in a preset teaching process;
    约束构建单元,用于根据预设的李雅普诺夫定理,构建所述稳定性约束条件,所述稳定性约束条件包括全局渐近稳定的约束条件和局部渐近稳定的约束条件;以及a constraint building unit, configured to construct the stability constraint according to a preset Lyapunov theorem, the stability constraint including a global asymptotically stable constraint and a locally asymptotically stable constraint;
    模型训练单元,用于根据所述稳定性约束条件,对所述极限学习机模型进行有监督的训练,将训练好的所述极限学习机模型设置为所述动态预测模型。And a model training unit, configured to perform supervised training on the extreme learning machine model according to the stability constraint condition, and set the trained extreme learning machine model as the dynamic prediction model.
  8. 如权利要求7所述的装置,其特征在于,所述装置还包括:The device of claim 7 wherein said device further comprises:
    示教采集单元,用于在所述示教过程中按照预设的采样时间间隔,在所述末端执行器的每条示教轨迹上采集所述末端执行器的位姿;以及a teaching acquisition unit, configured to acquire a pose of the end effector on each teaching track of the end effector according to a preset sampling time interval during the teaching; and
    样本生成单元,用于根据所述采样时间间隔和所述末端执行器每个采样点处的位姿,计算所述末端执行器每个采样点处的速度,将所述末端执行器每个采样点处的位姿、速度组合构成所述训练样本集的训练样本。 a sample generating unit, configured to calculate a velocity at each sampling point of the end effector according to the sampling time interval and a pose at each sampling point of the end effector, and sample the end effector each sampling The pose and velocity combination at the point constitutes a training sample of the training sample set.
  9. 一种机器人,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至5任一项所述方法的步骤。A robot comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor executes the computer program as claimed in claims 1 to 5 The steps of any of the methods described.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至5任一项所述方法的步骤。 A computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 5.
PCT/CN2017/110923 2017-11-14 2017-11-14 Robot imitation learning method and apparatus, robot and storage medium WO2019095108A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/110923 WO2019095108A1 (en) 2017-11-14 2017-11-14 Robot imitation learning method and apparatus, robot and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/110923 WO2019095108A1 (en) 2017-11-14 2017-11-14 Robot imitation learning method and apparatus, robot and storage medium

Publications (1)

Publication Number Publication Date
WO2019095108A1 true WO2019095108A1 (en) 2019-05-23

Family

ID=66539209

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/110923 WO2019095108A1 (en) 2017-11-14 2017-11-14 Robot imitation learning method and apparatus, robot and storage medium

Country Status (1)

Country Link
WO (1) WO2019095108A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112535474A (en) * 2020-11-11 2021-03-23 西安交通大学 Lower limb movement joint angle real-time prediction method based on similar rule search
CN113442145A (en) * 2021-09-01 2021-09-28 北京柏惠维康科技有限公司 Optimal pose determining method and device under constraint, storage medium and mechanical arm

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002040224A1 (en) * 2000-11-17 2002-05-23 Honda Giken Kogyo Kabushiki Kaisha Gait pattern generating device for legged mobile robot
KR20120077681A (en) * 2010-12-31 2012-07-10 강원대학교산학협력단 Intelligent robot apparatus and method for adaptively customizing according to a command
CN102825603A (en) * 2012-09-10 2012-12-19 江苏科技大学 Network teleoperation robot system and time delay overcoming method
CN106020190A (en) * 2016-05-26 2016-10-12 山东大学 Track learning controller, control system and method with initial state error correction
CN106346477A (en) * 2016-11-05 2017-01-25 上海新时达电气股份有限公司 Method and module for distinguishing load of six-axis robot
CN106573370A (en) * 2014-04-17 2017-04-19 软银机器人欧洲公司 Omnidirectional wheeled humanoid robot based on a linear predictive position and velocity controller
CN106774327A (en) * 2016-12-23 2017-05-31 中新智擎有限公司 A kind of robot path planning method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002040224A1 (en) * 2000-11-17 2002-05-23 Honda Giken Kogyo Kabushiki Kaisha Gait pattern generating device for legged mobile robot
KR20120077681A (en) * 2010-12-31 2012-07-10 강원대학교산학협력단 Intelligent robot apparatus and method for adaptively customizing according to a command
CN102825603A (en) * 2012-09-10 2012-12-19 江苏科技大学 Network teleoperation robot system and time delay overcoming method
CN106573370A (en) * 2014-04-17 2017-04-19 软银机器人欧洲公司 Omnidirectional wheeled humanoid robot based on a linear predictive position and velocity controller
CN106020190A (en) * 2016-05-26 2016-10-12 山东大学 Track learning controller, control system and method with initial state error correction
CN106346477A (en) * 2016-11-05 2017-01-25 上海新时达电气股份有限公司 Method and module for distinguishing load of six-axis robot
CN106774327A (en) * 2016-12-23 2017-05-31 中新智擎有限公司 A kind of robot path planning method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112535474A (en) * 2020-11-11 2021-03-23 西安交通大学 Lower limb movement joint angle real-time prediction method based on similar rule search
CN112535474B (en) * 2020-11-11 2021-12-28 西安交通大学 Lower limb movement joint angle real-time prediction method based on similar rule search
CN113442145A (en) * 2021-09-01 2021-09-28 北京柏惠维康科技有限公司 Optimal pose determining method and device under constraint, storage medium and mechanical arm

Similar Documents

Publication Publication Date Title
CN108115681B (en) Simulation learning method and device for robot, robot and storage medium
US11529733B2 (en) Method and system for robot action imitation learning in three-dimensional space
Hasan et al. Artificial neural network-based kinematics Jacobian solution for serial manipulator passing through singular configurations
Kolev et al. Physically consistent state estimation and system identification for contacts
US8078321B2 (en) Behavior control system
US8660699B2 (en) Behavior control system and robot
Koutras et al. A correct formulation for the orientation dynamic movement primitives for robot control in the cartesian space
Fang et al. Skill learning for human-robot interaction using wearable device
WO2020118730A1 (en) Compliance control method and apparatus for robot, device, and storage medium
Ott et al. Kinesthetic teaching of humanoid motion based on whole-body compliance control with interaction-aware balancing
CN114102600B (en) Multi-space fusion human-machine skill migration and parameter compensation method and system
Um et al. Independent joint learning: A novel task-to-task transfer learning scheme for robot models
Jetchev et al. Task space retrieval using inverse feedback control
WO2019095108A1 (en) Robot imitation learning method and apparatus, robot and storage medium
Liu et al. Modeling and control of robotic manipulators based on artificial neural networks: a review
Khadivar et al. Adaptive fingers coordination for robust grasp and in-hand manipulation under disturbances and unknown dynamics
Petrič et al. Online approach for altering robot behaviors based on human in the loop coaching gestures
Yan et al. Hierarchical policy learning with demonstration learning for robotic multiple peg-in-hole assembly tasks
Michieletto et al. Robot learning by observing humans activities and modeling failures
Yamane Kinematic redundancy resolution for humanoid robots by human motion database
Monforte et al. Multifunctional principal component analysis for human-like grasping
Zhu Robot Learning Assembly Tasks from Human Demonstrations
Fang et al. Learning from wearable-based teleoperation demonstration
Wei et al. Robotic skills learning based on dynamical movement primitives using a wearable device
Helin et al. Omnidirectional walking based on preview control for biped robots

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17932286

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17932286

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17932286

Country of ref document: EP

Kind code of ref document: A1