WO2022088593A1

WO2022088593A1 - Robotic arm control method and device, and human-machine cooperation model training method

Info

Publication number: WO2022088593A1
Application number: PCT/CN2021/082254
Authority: WO
Inventors: 段星光; 田焕玉; 温浩; 李长胜; 李建玺; 田野; 靳励行; 孟繁盛
Original assignee: 北京理工大学
Priority date: 2020-10-26
Filing date: 2021-03-23
Publication date: 2022-05-05
Also published as: CN112428278B; CN112428278A

Abstract

Disclosed are a robotic arm control method and device, and a human-machine cooperation model training method. The robotic arm control method comprises: obtaining a human-machine cooperation model, the human-machine cooperation model being a model for determining an expected attitude of a robotic arm according to a human-machine interaction force; obtaining the attitude at the current moment, and obtaining, according to the human-machine cooperation model, an expected attitude corresponding to the human-machine interaction force at the current moment; generating, according to the attitude at the current moment and the expected attitude corresponding to the human-machine interaction force at the current moment, an optimal trajectory where the robotic arm moves; and controlling the robotic arm according to the optimal trajectory. The present application solves the problem that a robot cannot move along a trajectory intended by the human.

Description

Control method and device of robotic arm and training method of man-machine collaborative model

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese patent application filed on October 26, 2020, with the application number 2020111594282 and the invention titled "Control Method, Device and Human-Machine Collaborative Model of Robotic Arm", all of which The contents are incorporated herein by reference.

technical field

The present application relates to the field of robotic arms, and in particular, to a control method and device for a robotic arm and a training method for a human-machine collaborative model.

Background technique

In the field of orthopedics and puncture robotics, there is a class of robots that can be used in the field of surgery, which interact with both the doctor and the environment. This type of robot can move according to the interaction force of the doctor and do work in the environment. However, when the related technology is dragging towards a specific trajectory (such as arc, straight line), the robot cannot judge the human attempt according to the human performance behavior, which makes the robot Unable to move along the trajectory of human intention, how to control the robot to accurately understand the doctor's intention and optimize the robot-doctor interaction experience has become an urgent problem to be solved.

For the problem that the robot cannot move along the trajectory intended by the human, no effective solution has been proposed yet.

SUMMARY OF THE INVENTION

The main purpose of this application is to provide a method for controlling a robotic arm, so as to solve the problem that the robot cannot move along the trajectory intended by humans.

In order to achieve the above purpose, the present application provides a control method and device for a robotic arm and a training method for a human-machine collaborative model.

In a first aspect, the present application provides a method for controlling a robotic arm.

The control method of the robotic arm according to the present application includes:

Obtaining a man-machine collaboration model, wherein the man-machine collaboration model is a model for determining the desired pose of the robotic arm according to the human-machine interaction force;

Obtain the pose at the current moment, and obtain the desired pose corresponding to the human-computer interaction force at the current moment according to the human-machine collaboration model;

Generate the optimal trajectory of the motion of the robotic arm according to the pose at the current moment and the desired pose corresponding to the human-computer interaction force at the current moment;

The robotic arm is controlled according to the optimal trajectory.

Further, generating the optimal trajectory of the motion of the robotic arm according to the pose at the current moment and the desired pose corresponding to the human-computer interaction force at the current moment includes:

Through the model predictive control MPC algorithm, multiple groups of random trajectories are generated according to the pose at the current moment and the expected pose corresponding to the human-computer interaction force at the current moment;

An optimal trajectory is selected from the set of random trajectories.

Further, the selecting the optimal trajectory from the multiple groups of random trajectories includes:

From the sets of random trajectories, an optimal trajectory is selected through an optimal trajectory control algorithm.

Further, the controlling the robotic arm according to the optimal trajectory includes:

Obtain the position and attitude angle motion information of the robotic arm;

The first mode control is performed on the normal component of the position and attitude angle motion information of the robotic arm;

The second mode control is performed on the tangential components of the position and attitude angle motion information of the robotic arm; wherein, the first mode is a robot guidance mode in which the robotic arm admittance is greater than the robotic arm admittance of the second mode; so The second mode is a human-guided mode in which the human admittance is greater than the human admittance of the first mode.

In a second aspect, the present application provides a training method for a human-machine collaboration model, which is used to obtain the human-machine collaboration model in the control method for a robotic arm in the first aspect.

The training method of the human-machine collaborative model according to the present application includes:

Obtaining multiple sets of human-computer interaction forces of the robotic arm and multiple sets of robotic arm poses corresponding to the multiple sets of human-computer interaction forces, where the multiple sets of human-computer interaction forces are multiple sets of original human-computer interaction forces;

A human-machine collaboration model is established according to the multiple sets of human-computer interaction forces and the multiple sets of robotic arm poses.

Further, after the man-machine collaboration model is established according to the multiple sets of human-computer interaction forces and the multiple sets of robotic arm poses, the method further includes:

The human-machine collaborative model is optimized according to the supervised learning method.

In a third aspect, the present application provides a control device for a robotic arm.

The control device of the robotic arm according to the present application includes:

A model acquisition module for acquiring a man-machine collaborative model, wherein the man-machine collaborative model is a model for determining the desired pose of the robotic arm according to the man-machine interaction force;

a pose obtaining module, configured to obtain the pose at the current moment, and obtain the desired pose corresponding to the human-computer interaction force at the current moment according to the human-machine collaboration model;

a trajectory generation module, configured to generate the optimal trajectory of the motion of the robotic arm according to the current moment posture and the desired posture and posture corresponding to the human-computer interaction force at the current moment;

The control module is used to control the robotic arm according to the optimal trajectory.

Further, the model acquisition module includes:

The optimization unit is used to optimize the human-machine collaborative model according to the supervised learning method.

Further, the trajectory generation module includes:

a random trajectory generation unit, configured to control the MPC algorithm through model prediction, and generate multiple sets of random trajectories according to the current moment pose and the expected pose corresponding to the human-computer interaction force at the current moment;

An optimal trajectory generation unit, configured to select an optimal trajectory from the multiple groups of random trajectories.

Further, the optimal trajectory generation unit further includes:

for selecting an optimal trajectory from the plurality of sets of random trajectories through an optimal trajectory control algorithm.

Further, the control module includes:

The controller control unit is used for controlling the manipulator through the controller of the manipulator according to the optimal trajectory, wherein the controller includes an inner layer controller that controls the manipulator and a man-machine collaboration model. Controlled outer controller.

In a fourth aspect, the present application provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the control method for a robotic arm provided in the first aspect and/or the second aspect Provide the steps of the training method of the human-machine collaboration model.

In a fifth aspect, the present application provides a robot, including a robotic arm, a sensor, a controller, a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the program when the processor executes the program. The steps of the control method of the robotic arm provided by the first aspect and/or the training method of the human-machine collaboration model provided by the second aspect.

In the embodiment of the present application, the expected pose corresponding to the human-machine interaction force at the current moment is determined by the human-machine collaboration model, and the desired motion of the robotic arm is generated according to the current moment pose of the robotic arm and the expected pose corresponding to the human-machine interaction force at the current moment The optimal trajectory of the robot arm is controlled by the optimal trajectory of the expected movement of the robot arm, so that the robot can move along the trajectory of human intention, so as to realize the control of the robot to accurately understand the doctor's intention and optimize the human-computer interaction experience. The technical effect of the robot further solves the problem that the robot cannot move along the trajectory of the human intention.

Description of drawings

In order to illustrate the specific embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the specific embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are only some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

1 is a schematic flowchart of a method for controlling a robotic arm according to an embodiment of the present application;

2 is a schematic flowchart of a method for training a human-machine collaborative model according to an embodiment of the present application;

FIG. 3 is a structural block diagram of a control device of a robotic arm according to an embodiment of the present application.

Detailed ways

In order to make those skilled in the art better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only Embodiments are part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

It should be noted that the terms "first", "second" and the like in the description and claims of the present invention and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances for the embodiments of the invention described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

In the present invention, the terms "installed", "arranged", "provided with", "connected", "connected" and "socketed" should be construed in a broad sense. For example, it may be a fixed connection, a detachable connection, or a unitary structure; it may be a mechanical connection, or an electrical connection; it may be directly connected, or indirectly connected through an intermediary, or between two devices, elements, or components. internal communication. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to specific situations.

It should be noted that the embodiments of the present invention and the features of the embodiments may be combined with each other under the condition of no conflict. The present invention will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

As shown in FIG. 1, the method includes the following steps S11 to S14:

S11: Obtain a human-machine collaboration model, wherein the human-machine collaboration model is a model for determining the desired pose of the robotic arm according to the human-machine interaction force.

The human-machine collaborative model can be a pre-stored model in the control system of the robotic arm, or a human-machine collaborative model can be obtained by training with a machine learning method, or it can be a human-machine collaborative model optimized after training with a machine learning method. In this embodiment, by way of example, the human-machine collaboration model is obtained by training through a machine learning method. For the specific training method, please refer to the description of FIG. 2 in the embodiment section below. Specifically, the human-machine collaborative model is a variety of neural network models or Gaussian process models that are pre-trained via a Gaussian Mixture Model (hereinafter referred to as GMM).

S12: Obtain the pose at the current moment, and obtain the desired pose corresponding to the human-computer interaction force at the current moment according to the human-machine collaboration model.

Among them, the human-computer interaction force can be directly obtained through the force sensor installed on the robotic arm. Specifically, the force sensor is a multi-dimensional force sensor. In this embodiment, for example, the force sensor is acquired by a three-dimensional force sensor or a six-dimensional force sensor. The obtained human-computer interaction force at the current moment is input into the human-computer collaboration model, and the predicted expected pose of the robotic arm at the next moment can be obtained. The desired pose is used to control the tangent direction of the path within the limited area, and the control method is exited when the desired pose deviates greatly. The human-computer interaction force may also be a human-computer interaction force including a human-computer resistance force. The human-machine interaction force including the human-machine impedance force can be obtained through the force sensor installed on the manipulator arm, and then the force obtained by the force sensor and the corresponding current moment pose are solved to obtain the virtual constraint of the manipulator arm (that is, the man-machine impedance force). ), so that the human-computer interaction force including the human-computer resistance force can be determined by summing the human-computer interaction force obtained by the force sensor and the virtual constraint obtained by the solution.

S13: Generate an optimal trajectory of the robotic arm period motion according to the pose at the current moment and the desired pose corresponding to the human-computer interaction force at the current moment.

"Generate the optimal trajectory of the motion of the manipulator according to the position and attitude at the current moment and the expected position and attitude corresponding to the human-computer interaction force at the current moment" is specifically: through the model predictive control (model predictive control, hereinafter referred to as MPC) algorithm, According to the pose at the current moment and the expected pose corresponding to the human-computer interaction force at the current moment, multiple groups of random trajectories are generated; the optimal trajectory is selected from the multiple groups of random trajectories.

Specifically, MPC is a model that predicts the process output in the future based on the model at the current moment, selects the objective optimization function, predicts the future output sequence and outputs the control quantity at the current moment, and the latest measured data at the next moment is the process of the previous moment. Algorithm for feedback correction of the output sequence. That is, MPC can make the human-computer interaction model at the current moment predict the expected pose output in a period of time in the future. According to the pose at the current moment and the human-machine collaboration model, the expected pose in the future time can be predicted through MPC, multiple sets of random trajectories can be generated, and the optimal trajectory of the multiple sets of random trajectories can be selected. Optionally, the optimal trajectory of the movement of the robotic arm generated in this step is the optimal trajectory of the movement of the robotic arm within the limited area, and the feature of the optimal trajectory is that the operator can control the forward and backward directions in the tangential direction; But the normal direction is controlled by the robot autonomously. Since humans have strong control ability on the tangent line, but the robot has strong control ability on the normal line, the operator transmits the desired position to the robotic arm through the man-machine collaboration model described in claim 1, and the robotic arm tracks the desired position on the path. The projected point on the to achieve the drag effect.

"Selecting the optimal trajectory from the multiple groups of random trajectories" is specifically: selecting the optimal trajectory from the multiple groups of random trajectories through an optimal trajectory control algorithm.

Specifically, the selection of the optimal trajectory may be determined by a linear quadratic regulator algorithm, a nonlinear quadratic regulator (Iterative Linear Quadratic Regulator, hereinafter referred to as iLQR) algorithm or differential dynamic programming, which is not limited here. In this embodiment, by way of example, the optimal trajectory is selected and determined by the iLQR algorithm in the optimal trajectory control algorithm. Among them, the iLQR algorithm can obtain the optimal control law of state nonlinear feedback, which is easy to form closed-loop optimal control. That is, the optimal trajectory among multiple groups of random trajectories can be determined by the iLQR algorithm. The optimal trajectory is gradually updated by an iterative optimization algorithm. When the iteration converges, the iterative trajectory is considered to be the optimal trajectory. Optionally, the movement trajectory (movement position, speed) is optimized between 10ms and 500ms according to the pose at the current moment and the expected pose of the human-computer interaction force at the current moment. Among them, the robot has a value weight about the position on the normal line to precisely control the position, and the human has a large admittance value in the tangential direction to realize the human-guided dragging. Also, humans are more controllable than robots on the tangent component, but robots are more controllable than humans on the normal component. The user transmits the desired position to the robotic arm through the human-machine collaboration model in the above step S1, and the robotic arm realizes the dragging effect by tracking the projected point of the desired position on the path.

S14: Control the robotic arm according to the optimal trajectory.

"Controlling the robotic arm according to the optimal trajectory" is specifically: acquiring the position and attitude angular motion information of the robotic arm; performing the first mode control on the normal components of the position and attitude angular motion information of the robotic arm; The tangential component of the position and attitude angle motion information of the arm is used for the second mode control; wherein, the first mode is the robot guidance mode in which the manipulator arm admittance is greater than the manipulator arm admittance of the second mode; the second mode is the human admittance A human-guided mode that is greater than the human admittance of the first mode.

Specifically, according to the robot dynamics, the error feedback amount of the manipulator is constructed by the impedance coordinate system of the actual motion of the manipulator and the desired coordinate system of the desired motion of the manipulator, as shown in formula (1):

Among them, M(q) is the inertia matrix of the manipulator in Cartesian space, the unit of the first three columns of the matrix is kg, and the unit of all the following elements is Ns ² /rad; q is the joint angle; the unit of the first three rows of x is m, The unit corresponding to all subsequent lines is rad;

is the viscosity matrix; g(q) is the gravity vector; f _env is the environment interaction force screw, which can be obtained by the force sensor of the environment-robot interaction; f is the human-computer interaction force, which can be performed by the force sensor in the above step S11 Obtain.

Based on the dynamic expression, the force controller is constructed based on the feedback linearization method to complete the inner loop performance of the manipulator with high stiffness to humans and low stiffness to the environment. The input of the inner loop is the position and attitude of the impedance coordinate system. Through the iLQR method in the above step S3, the optimal trajectory of the motion corresponding to each joint of the manipulator and the variable control parameters based on the weight of the cost matrix can be obtained as the optimal trajectory of the desired trajectory and The error feedback of the actual trajectory of the manipulator is controlled in different directions.

In the process of path tracking (tracking the path generated by pre-planning and human-computer interaction) of the robotic arm, when the position and posture of the robotic arm move in the normal sub-direction (that is, when the attitude angle and the normal direction move), according to the maximum admittance of the machine (The machine does more work) and the user has a small admittance (the human does less work) to control the robotic arm (ie the first mode); when the position and posture of the robotic arm move in the tangential direction (preferably, when the tangential direction moves ), the robotic arm is controlled (ie, the second mode) according to the user's large admittance (human work is large) and the machine small admittance (machine work is small). For example, when the predicted pose X _t+1 (0m, 0.1m, 0.1m, 0.3°, 0.1°, 0.2°) of the desired pose is obtained according to the human-machine collaboration model, it can be determined according to the large admittance of the user and the small admittance of the machine. The principle is to control the robotic arm. The first mode is a robot guidance mode, and the second mode is a human guidance mode. The first mode and the second mode can coexist, but the movement directions of the two modes are different. That is, the first mode control is performed on the robot arm in the normal direction; the second mode control is performed on the robot arm in the tangential direction.

From the above description, it can be seen that the present invention achieves the following technical effects:

The expected pose corresponding to the human-machine interaction force at the current moment is determined through the human-machine collaboration model, so that the predicted displacement of the robotic arm at the current moment and the predicted moment can be determined, and multiple sets of random trajectories of the predicted displacement can be generated through MPC, and then the optimal trajectory control algorithm can be used to generate multiple random trajectories. The optimal trajectory of multiple random trajectories is determined, and the position and attitude angle motion information of the manipulator is obtained, and the manipulator is controlled to achieve the effect of making the robot move along the trajectory intended by humans.

According to an embodiment of the present application, a method for obtaining a human-machine collaborative model in the above-mentioned control method of a robotic arm is also provided. As shown in FIG. 2 , the human-machine collaborative model training method includes the following steps S21 and S22 :

S21: Obtain multiple sets of human-computer interaction forces of the robotic arm and multiple sets of robotic arm poses corresponding to the multiple sets of human-computer interaction forces, where the multiple sets of human-computer interaction forces are multiple sets of original human-computer interaction forces;

S22: Establish a human-machine collaboration model according to the multiple sets of human-computer interaction forces and the multiple sets of robotic arm poses.

The human-computer interaction force can be directly obtained through the force sensor installed on the robotic arm. Specifically, the force sensor is a multi-dimensional force sensor. In this embodiment, for example, the human-computer interaction force is acquired by a six-dimensional force sensor. The training force group obtained by the force sensor includes three training force components and three training torque components corresponding to the X, Y, and Z axes. The pose of the robotic arm can be recorded by establishing a coordinate system of the robotic arm including the X, Y, and Z axes. Specifically, the pose of the robotic arm includes three distance movement components and three angular movement components corresponding to the X, Y, and Z axes. For example, the acquired human-computer interaction force W _t (1N, 0N, 0N, 0.1Nm, 0.2Nm, 0.3Nm), wherein 1N, 0N, 0N are three training force components, 0.1Nm, 0.2Nm, 0.3Nm are Three training torque components. The robot arm pose X _t (0.01m, 0.02m, 0.01m, 0.3°, 0.4°, 0.1°), where 0.01m, 0.02m, 0.01m are the three distance movement components corresponding to the X, Y, and Z axes, 0.3°, 0.4°, and 0.1° are three angular movement components, respectively. It should be noted that the number of groups of human-computer interaction force is the same as that of the pose of the robotic arm. For example, specifically, it is assumed that the obtained groups of human-computer interaction forces may be 3-5 groups. That is, when the obtained multiple sets of human-computer interaction forces are 3 sets, the obtained poses of the manipulator are also 3 sets. The human-computer interaction force may also be a human-computer interaction force including a human-computer resistance force. The virtual constraint (ie, human-machine impedance force) of the robotic arm is obtained by solving the force obtained by the force sensor and the corresponding pose at the current moment, so that the human-machine interaction force obtained by the force sensor and the virtual constraint obtained by the solution can be summed , to determine the human-machine interaction force including the human-machine resistance force.

The model input for training the human-machine collaboration model can be the human-machine interaction force at the current moment and the sampled value of the robot arm's pose at the current moment, and the human-machine collaboration model is trained according to the sampled values input by the model. The output is the actual pose of the end of the manipulator at the next moment. Through the input-output, the pose relationship between the current moment and the next moment of the manipulator can be obtained, so the model belongs to a kind of prediction. Specifically, the network model trained by the human-machine collaborative model may be a Gaussian Mixture Model (GMM for short), a Bayesian network model, a neural network model, etc., which is not limited here. The training method is to draw multiple trajectories by dragging the end of the robotic arm, and record the human-computer interaction force and the current posture of the robotic arm at every moment. Based on the inference relationship between the current moment and the next moment, the model can be trained through supervised learning.

Specifically, step S21 "obtaining multiple sets of human-computer interaction forces of the robotic arm and multiple sets of robotic arm poses corresponding to the multiple sets of human-computer interaction forces" may be to obtain multiple sets of human-computer interaction forces of the robotic arm in the trust region The poses of the multiple sets of robotic arms corresponding to the multiple sets of human-computer interaction forces. The trust region refers to the region where the sampling distribution ps of the force sensor of the acquired human-computer interaction force is between the preset KL divergence thresholds, where the KL divergence refers to the KL divergence between ps and the human-machine collaborative model , as shown in equation (2), the KL divergence can be expressed as:

D _KL (p _s ,p _m )≤th _KL (2)

Among them, D _KL is the KL divergence, ps is the sampling distribution of the force sensor, ps is obtained by maximum likelihood estimation; pm is the model distribution of the human-machine collaborative model, th _KL is the first preset KL divergence threshold, which can be determined by the user. set up. When it is detected that the KL divergence calculated at ps and pm is less than th _kl , it means that the two are close. That is to say, the prediction model (human-machine collaborative model) conforms to the actual robot motion. At this time, it is in the robot active mode (task execution mode). When the prediction model does not conform to the actual robot motion, the robot is in free dragging mode. The first preset KL The divergence threshold can also be obtained by the human-machine collaboration model and the machine learning method through user learning in different man-machine resistance forces (for example, the first preset KL divergence threshold can be -20).

Further, after training the model of the human-machine collaboration model, the method further includes: judging whether the human-machine collaboration model is an effective model.

Specifically, to determine whether the human-machine collaborative model is an effective model, it can be determined whether the KL divergence between ps and pm is greater than the second preset KL divergence, and if it is greater than the human-machine collaborative model is an effective model. (For example, the KL divergence between ps and pm is -35, the second preset KL divergence is -50, and the KL divergence between ps and pm is greater than the second preset KL divergence, then the human-machine collaborative model is a valid model).

Specifically, to determine whether the human-machine collaboration model is an effective model, the likelihood of the human-computer interaction force collected in the above step S1 can be calculated, and whether the likelihood is greater than the first preset likelihood threshold, if greater, Then the human-machine collaboration model is an effective model. For example, suppose that the three groups of human-computer interaction forces collected in the above step S1 are W ₁ (1N, 0.5N, 0N, 0.1Nm, 0.2Nm, 0.3Nm); W ₂ (2N, 0.5N, 0N, 0.1Nm, 0.2 Nm, 0.3Nm); W ₃ (3N, 0.5N, 0N, 0.1Nm, 0.2Nm, 0.3Nm), through W ₁ , W ₂ , W ₃ , the model likelihood = 0.3 can be obtained, and the first prediction Set whether the likelihood threshold is less than the model likelihood, if the model likelihood is greater than the model likelihood, the human-machine collaborative model is an effective model (for example, model likelihood=5, the first preset likelihood threshold= 2.5, if the model likelihood is greater than the first preset likelihood threshold, the human-machine collaborative model is an effective model).

The human-machine collaborative model is optimized according to the supervised learning method, and the optimized human-machine collaborative model is generated. The optimized human-machine collaboration model is the model under the trust region.

Using the supervised learning method to optimize the parameters of the human-machine collaborative model includes using prior information. Specifically, using the maximum likelihood principle in the supervised learning method to optimize the parameters of the human-machine collaborative model, as shown in formula (3) , the corresponding parameters of the optimized training model are:

Among them, pm is the model distribution of the human-machine collaborative model, f _h is the human-machine interaction force obtained by the force sensor, x _d is the desired pose of the robotic arm, t is the current moment, t+1 is the next moment, and θ _C is the human parameters of the machine collaboration model. Specifically, when the human-machine collaboration model is GMM, θ _C is the serial number of the sub-model; when the human-machine collaboration model is a neural network model, θ _C is the dimensionless weight of connecting nodes.

For example, after the parameters of the human-machine collaboration model are optimized, the human-machine collaboration model is optimized according to the optimized parameters. Specifically, different optimization methods are used for different modeling methods of the human-machine collaborative model. For example, when the human-machine collaborative model is GMM, the maximization (Expectation-Maximum, hereinafter referred to as EM) algorithm is used to perform the human-machine collaborative model. Optimization; when the human-machine collaboration model is a neural network, the stochastic gradient descent (hereinafter referred to as SGD) method is used to optimize the human-machine human-machine collaboration model.

Further, the embodiment of the present application also provides a control method of another mechanical arm, as shown in Figure 3, and specifically includes the following steps:

S31. Obtain a human-machine collaboration model, and perform maximum likelihood estimation on the measurement results to obtain a sampling distribution every time a plurality of continuous human-machine interaction force and moment values are collected during the human-machine dragging process.

The human-machine collaborative model is trained based on the collected data of human-machine interaction force and torque. The specific training process is as follows:

First, collect the data of human-machine interaction force and torque by dragging the "human-robot arm", and collect the force values and torque values corresponding to the three directions of xyz, such as W(1N, 0N, 0N, 0.1Nm, 0.2Nm , 0.3Nm) and the corresponding position movement X (0.01m, 0.02m, 0.01m, 0.3°, 0.4°, 0.1°), a large amount of data will be collected in practice.

Secondly, through a linear model or a neural network, the human-computer interaction force and displacement (position movement) are obtained as input, and the next moment force and torque value are output as a human-machine collaborative model. For example the inputs are Wt(1N, 0N, 0N, 0.1Nm, 0.2Nm, 0.3Nm) and Xt(0.01m, 0.02m, 0.01m, 0.3°, 0.4°, 0.1°). The output gets Wt+1(0N, 1N, 1N, 0.3Nm, 0.1Nm, 0.2Nm)

Thirdly, the human-machine collaboration model is optimized by the related training methods of model-based reinforcement learning, such as stochastic gradient descent and variational inference.

Finally, based on the current model parameters, that is, the weights of the neural network, the fixed parameters are then executed from the first step.

The human-machine collaborative model is finally obtained through multiple cycles of training in the preceding four steps.

After the human-machine collaboration model is obtained, "each time a plurality of continuous human-machine interaction forces and torque values are collected during the human-machine dragging process, the maximum likelihood estimation is performed on the measurement results to obtain the sampling distribution." The maximum likelihood estimation of the trajectory is performed. For example, during the human-machine dragging process, the maximum likelihood estimation of the measurement results is performed by collecting 3-5 continuous human-machine interaction force and torque values. Specifically, three consecutive groups of human-computer interaction force and torque values input from t0 to t can be collected W1 (1N, 0.5N, 0N, 0.1Nm, 0.2Nm, 0.3Nm); W2 (2N, 0.5N, 0N, 0.1 Nm, 0.2Nm, 0.3Nm); W3 (3N, 0.5N, 0N, 0.1Nm, 0.2Nm, 0.3Nm), the distribution obeyed by W is obtained through maximum likelihood estimation, assuming Gaussian distribution, the expectation is (1N, 0.5 N, 0N, 0.1Nm, 0.2Nm, 0.3Nm), the variance is diag(0.1N, 0.05N, 0N, 0.1Nm, 0.1Nm, 0.03Nm), and diag represents a diagonal matrix.

S32. Based on the multiple human-computer interaction force and moment values generated by the human-computer collaboration model, perform maximum likelihood estimation to obtain the model distribution.

For the example of the above steps, that is, the distribution of the trajectory of the force calculated by the human-machine collaborative model at time t, assume a Gaussian distribution, the expectation is (2N, 0.5N, 0N, 0.1Nm, 0.2Nm, 0.3Nm), and the variance is diag (0.2N, 0.05N, 0N, 0.1Nm, 0.1Nm, 0.03Nm), diag represents a diagonal matrix.

S33. Calculate the KL divergence between the sampling distribution and the model distribution.

Calculate the KL divergence between the distribution obtained in the above steps S31 and S32 and the model distribution, and specifically calculate the distance between the two distributions through the KL divergence.

S34. Compare the value of the KL divergence with the trustworthy threshold, and determine whether the robotic arm executes the free drag mode or the task execution mode according to the comparison result.

If the KL divergence is within the trustworthy threshold, that is, the value of the KL divergence is less than or equal to the trustworthy threshold, that is, the distance between the sampling distribution and the model distribution is small, then the human-machine collaborative model is an effective model, and this mode is the execution Task mode; if the KL divergence is not within the trustworthy threshold, that is, the value of the KL divergence is greater than the trustworthy threshold, that is, the distance between the sampling distribution and the model distribution is large, the human-machine collaboration model is an invalid model, and this mode is free Drag mode.

The trustworthy threshold is the corresponding threshold in the trustworthy region, and the specific KL divergence can be expressed as:

D _KL ( _p _s ,pm )≤th _KL

where D is the KL divergence, which represents the distance between two Gaussian distributions. Ps is the pose distribution (ie model distribution) obtained by the force sensor of human-computer interaction force through the man-machine collaborative model, Pm is the actual pose distribution (ie sampling distribution) obtained from the actual path obtained by maximum likelihood estimation; th _KL is the pre- The set KL divergence threshold can be set by the user. When it is detected that the KL divergence calculated at Ps and Pm is less than th _KL , it means that the two are close. That is, the model calculation conforms to the actual robot motion, and the robot is in the active mode at this time. When the model calculation does not conform to the actual robot motion, the robot is in the free drag mode.

In addition, it should be noted that the task execution mode is that the robot arm performs specific operations on the operation object according to the plan. It can be seen that the method of the embodiment of the present application can judge which mode is currently based on the judgment of the KL divergence between the sampling distribution and the model distribution, so as to realize free switching between the two modes. In the task execution mode, the human-machine collaborative model provides assistance for trajectory control, and can also switch between the two modes; in the free drag mode, the human-machine collaborative model does not provide assistance for trajectory control, only for the two modes. Toggle comes into play. The validity and invalidity of the model in this application are differentiated according to whether it can provide help for control. If the model can provide help, it is effective, and if it cannot provide help, it is invalid.

In the execution task mode, the human-machine collaboration model is the effective model, so the control flow in the execution task mode is described in detail.

Specifically, in the task execution mode, the desired pose corresponding to the human-computer interaction force at the current moment is obtained according to the human-machine collaboration model, and the desired pose is recorded as the first pose; the desired pose corresponding to the trajectory planned by the host computer is obtained. , denoted as the second pose; determine the pose of the robot arm in the impedance coordinate system based on the difference between the first pose and the second pose, and then determine the optimal trajectory; control.

The human-computer interaction force can be obtained through a force sensor. For example, a force sensor is set at the handle that realizes the human-computer interaction, and the human-computer interaction force can be collected through the force sensor. After inputting the human-computer interaction force and the current pose, the expected pose at the next moment can be predicted, and the desired pose is recorded as the first pose

Among them, it should be noted that the impedance coordinate system refers to the coordinate system after deformation (change), and a specific example is given for illustration: obstacles may be encountered in the actual operation process, so it is necessary to bypass obstacles, The coordinates corresponding to the obstacles that can be bypassed are the coordinates in the impedance coordinate system.

Among them, the optimal trajectory is generated by the iLQR method, and the optimal trajectory is obtained to realize the update of the target point of the manipulator, which belongs to the outer loop control. After the update of the target point is achieved, the inner loop control is required, that is, the manipulator is controlled according to the optimal trajectory. Specifically: obtain the pose of the manipulator in the impedance coordinate system (the output of the outer loop control); move in the attitude angle and normal direction, plan the large admittance according to the upper computer, and control the human dragging the small admittance; move in the tangential direction , according to the person dragging the large admittance, the host computer plans the small admittance to control. The inner loop control is based on the dynamic model of the robot arm itself (robot dynamics model). The inner loop control can get the actual torque of the robot arm. The output of the outer loop is the input of the inner loop, and the output of the inner loop is the input of the outer loop. For the control of the inner loop and the outer loop, see Figure 4, where the doctor interaction model (ie, the human-machine collaboration model) is the outer loop control, and the human-machine collaboration impedance model and the robot model (robot dynamics model) are the inner loop control. . The feedforward controller, feedback controller, and kh as a whole correspond to the iLQR method. The human-machine cooperation impedance model is explained: the human-machine impedance model is constructed by the iLQR method to control the errors of the desired coordinate system and the impedance coordinate system in different directions.

In addition, it should be noted that MPC is a model based on the current moment to predict the process output for a period of time in the future. It selects the objective optimization function, predicts the future output sequence and outputs the control amount at the current moment. The algorithm for feedback correction of the process output sequence at the moment. That is, in this embodiment, the MPC can generate a series of control quantities by means of the man-machine collaboration model and make the total cost of the trajectory corresponding to the control quantity (torque of the manipulator)-state (the pose of the manipulator) to be the lowest. According to the pose at the current moment and the human-machine collaboration model, the expected pose in the future time can be predicted through the human-machine collaboration model, multiple sets of random trajectories can be generated, and the optimal trajectory of the multiple sets of random trajectories can be selected.

Finally, the effects of the embodiments of the present application are summarized:

1. Control the position and speed of the robot model by constructing an inverse dynamics method and a gravity compensation method (robot dynamics model). The control mode is the torque control mode of the robot, and its purpose is to track the trajectory generated by the impedance model.

2. By constructing a human-machine cooperative impedance model, that is, a second-order stiffness damping model, the physical characteristics of the interaction between humans and machines are represented. This characteristic is a fixed and time-invariant differential equation, and the purpose is to maintain the feel during dragging Consistent.

3. By designing a feedforward controller and a feedback controller based on the linear controller theory and related methods, the linear controller is updated by using the value function of labor-saving in the main direction, human labor-saving in the auxiliary direction and the smallest error.

4. Based on the observation of the online doctor interaction dynamic model (doctor interaction model), the doctor interaction model is compared with the nominal dynamics (planned trajectory), so as to obtain the doctor's intention and realize a certain mode switch.

5. The control method of the present application can be applied in the medical field, such as orthopedic surgery and puncture in the surgical field.

According to an embodiment of the present invention, a device 10 for implementing the above-mentioned control method of a robotic arm is also provided. As shown in FIG. 5 , the control device 10 of the robotic arm includes:

The model obtaining module 11 is used for obtaining a human-machine collaboration model, wherein the human-machine collaboration model is a model for determining the desired pose of the robotic arm according to the human-machine interaction force;

The pose obtaining module 12 is configured to obtain the pose at the current moment, and obtain the desired pose corresponding to the human-computer interaction force at the current moment according to the human-machine collaboration model;

A trajectory generation module 13, configured to generate the optimal trajectory of the motion of the robotic arm according to the current moment posture and the desired posture and posture corresponding to the human-computer interaction force at the current moment;

The control module 14 is configured to control the robotic arm according to the optimal trajectory.

Further, the model acquisition module 11 includes:

Further, the trajectory generation module 13 includes:

Further, the optimal trajectory generation unit further includes:

Further, the control module 14 includes:

Specifically, for the implementation of each module in this embodiment, reference may be made to the relevant implementation in the method embodiment, and details are not described again.

From the above description, it can be seen that the application has achieved the following technical effects:

Obviously, those skilled in the art should understand that the above-mentioned modules or steps of the present invention can be implemented by a general-purpose computing device, and they can be centralized on a single computing device or distributed in a network composed of multiple computing devices Alternatively, they can be implemented with program codes executable by a computing device, so that they can be stored in a storage device and executed by the computing device, or they can be made into individual integrated circuit modules, or they can be integrated into The multiple modules or steps are fabricated into a single integrated circuit module. As such, the present invention is not limited to any particular combination of hardware and software.

It should be noted that the steps shown in the flowcharts of the accompanying drawings may be executed in a computer system, such as a set of computer-executable instructions, and, although a logical sequence is shown in the flowcharts, in some cases, Steps shown or described may be performed in an order different from that herein. Different embodiments may also refer to or be combined with each other.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, various modifications and variations can be made by those skilled in the art without departing from the spirit and scope of the present invention, and such modifications and variations fall within the scope of the appended claims within the limited range.

Claims

A method for controlling a robotic arm, comprising:

Obtaining a human-machine collaboration model, wherein the human-machine collaboration model is a model for determining the desired pose of the robotic arm according to the human-machine interaction force;

Obtain the pose at the current moment, and obtain the desired pose corresponding to the human-computer interaction force at the current moment according to the human-machine collaboration model;

Generate the optimal trajectory of the motion of the robotic arm according to the pose at the current moment and the desired pose corresponding to the human-computer interaction force at the current moment;

The robotic arm is controlled according to the optimal trajectory.
The method for controlling a robotic arm according to claim 1, wherein the generating the optimal trajectory of the robotic arm motion according to the pose at the current moment and the desired pose corresponding to the human-machine interaction force at the current moment comprises:

Through the model predictive control MPC algorithm, multiple groups of random trajectories are generated according to the pose at the current moment and the expected pose corresponding to the human-computer interaction force at the current moment;

An optimal trajectory is selected from the set of random trajectories.
The control method for a robotic arm according to claim 2, wherein the selecting an optimal trajectory from the multiple groups of random trajectories comprises:

From the sets of random trajectories, an optimal trajectory is selected through an optimal trajectory control algorithm.
The method for controlling a robotic arm according to claim 1, wherein the controlling the robotic arm according to the optimal trajectory comprises:

Obtain the position and attitude angle motion information of the robotic arm;

The first mode control is performed on the normal component of the position and attitude angle motion information of the robotic arm;

The second mode control is performed on the tangential components of the position and attitude angle motion information of the robotic arm; wherein, the first mode is a robot guidance mode in which the robotic arm admittance is greater than the robotic arm admittance of the second mode; so The second mode is a human-guided mode in which the human admittance is greater than the human admittance of the first mode.
The method for controlling a robotic arm according to claim 1, wherein after acquiring the man-machine collaboration model, the method further comprises:

In the process of human-machine dragging, each time a plurality of continuous human-machine interaction force and moment values are collected, the maximum likelihood estimation is performed on the measurement results to obtain the sampling distribution;

Based on the multiple human-machine interaction forces and torque values generated by the human-machine collaborative model, the maximum likelihood estimation is performed to obtain the model distribution;

calculating the KL divergence between the sampling distribution and the model distribution;

The value of KL divergence is compared with the trustworthy threshold, and according to the comparison result, it is judged that the robot arm executes the free dragging mode or the executing task mode.
The method for controlling a robotic arm according to claim 5, wherein the comparison between the value of the KL divergence and the trustworthy threshold, and judging that the robotic arm executes the free drag mode or the execution task mode according to the comparison result comprises:

If the value of the KL divergence is less than or equal to the trustworthy threshold, the robot arm is judged to perform the task mode;

If the value of the KL divergence is greater than the trustworthy threshold, it is determined that the robotic arm is in the free dragging mode.
The control method of the manipulator according to claim 6, wherein, if the manipulator executes the free drag mode, the man-machine collaboration model is an invalid model;

If the manipulator executes the task mode, the human-machine collaborative model is the valid model.
The control method of a robotic arm according to claim 7, wherein if the robotic arm executes the task mode, the robotic arm is generated according to the current moment pose and the desired pose corresponding to the human-machine interaction force at the current moment. The optimal trajectory of the movement.
The method for controlling a robotic arm according to claim 8, wherein the generating the optimal trajectory of the robotic arm motion according to the pose at the current moment and the desired pose corresponding to the human-machine interaction force at the current moment comprises:

Obtain the desired pose corresponding to the human-machine interaction force at the current moment based on the human-machine collaboration model, and record it as the first pose;

Obtain the desired pose corresponding to the trajectory planned based on the host computer, and record it as the second pose;

The pose of the manipulator in the impedance coordinate system is determined based on the difference between the first pose and the second pose, and then the optimal trajectory is determined.
The method for controlling a robotic arm according to claim 9, wherein the controlling the robotic arm according to the optimal trajectory comprises:

Obtain the pose of the robotic arm in the impedance coordinate system;

In attitude angle and normal motion, the large admittance is planned according to the host computer, and the human drags the small admittance to control;

Move in the tangential direction, according to the person dragging the large admittance, the host computer plans the small admittance for control.
The method for controlling a robotic arm according to claim 9, wherein the generating the optimal trajectory of the robotic arm motion according to the current moment pose and the desired pose corresponding to the human-machine interaction force at the current moment comprises:

The optimal trajectory of the robot arm movement is generated based on the optimal control method iLQR method.
A training method of a man-machine collaborative model, used to obtain the man-machine collaborative model of the control method of a robotic arm according to any one of claims 1-10, the training method of the man-machine collaborative model comprising:

Obtaining multiple sets of human-computer interaction forces of the robotic arm and multiple sets of robotic arm poses corresponding to the multiple sets of human-computer interaction forces, where the multiple sets of human-computer interaction forces are multiple sets of original human-computer interaction forces;

A human-machine collaboration model is established according to the multiple sets of human-computer interaction forces and the multiple sets of robotic arm poses.
The method for training a human-machine collaborative model according to claim 12, wherein after the human-machine collaborative model is established according to the multiple sets of human-computer interaction forces and the multiple sets of robotic arm poses, the method further comprises: include:

The human-machine collaborative model is optimized according to the supervised learning method, and the optimized human-machine collaborative model is generated.
The training method for a human-machine collaborative model according to claim 12, wherein the acquiring multiple sets of human-computer interaction forces of the robotic arm comprises: acquiring multiple sets of human-computer interaction forces when a person operates the handle through a six-dimensional force sensor ;

Obtaining the poses of the multiple sets of robotic arms corresponding to the multiple sets of human-computer interaction forces includes:

The upper computer and the path planning system are obtained, in the form of the expected pose of the manipulator in the impedance coordinate system.
A control device for a robotic arm, comprising:

a model obtaining module, used for obtaining a human-machine collaboration model, wherein the human-machine collaboration model is a model for determining the desired pose of the robotic arm according to the human-machine interaction force;

a pose obtaining module, configured to obtain the pose at the current moment, and obtain the desired pose corresponding to the human-computer interaction force at the current moment according to the human-machine collaboration model;

A trajectory generation module, configured to generate an optimal trajectory of the motion of the robotic arm according to the current moment posture and the expected posture and posture corresponding to the human-computer interaction force at the current moment;

The control module is used to control the robotic arm according to the optimal trajectory.
The control device for a robotic arm according to claim 15, wherein the trajectory generation module comprises:

a random trajectory generation unit, configured to generate multiple groups of random trajectories through the MPC algorithm according to the current moment posture and the expected posture corresponding to the human-computer interaction force at the current moment;

The optimal trajectory generation unit selects the optimal trajectory from the multiple groups of random trajectories.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the control method and /or the training method of the human-machine collaborative model according to any one of claims 12 to 14.
A robot comprising: a robotic arm, a sensor, at least one processor; and a memory connected in communication with the at least one processor; wherein the memory stores a computer program executable by the at least one processor, The computer program is executed by the at least one processor, so that the at least one processor executes the control method for a robotic arm according to any one of claims 1-11 and/or any one of claims 12 to 14. The training method of the human-machine collaborative model.