CN117944049A

CN117944049A - Robot kinematics parameter correction method and system based on reinforcement learning

Info

Publication number: CN117944049A
Application number: CN202410149562.6A
Authority: CN
Inventors: 晏丕松; 林生智
Original assignee: Guangzhou Weimou Medical Instrument Co ltd
Current assignee: Guangzhou Weimou Medical Instrument Co ltd
Priority date: 2024-02-01
Filing date: 2024-02-01
Publication date: 2024-04-30

Abstract

The invention relates to a robot kinematics parameter correction method based on reinforcement learning, which comprises the following steps: establishing a Q-leraning model; randomly executing any motion for modifying the kinematic parameters according to the probability of epsilon, executing the motion for enabling the Q table to take the maximum value under the current state according to the probability of 1-epsilon, and modifying the corresponding kinematic parameters; controlling the robot to execute RCM motion; shooting pictures of the tail end of an execution element of the robot by using two shooting devices with mutually perpendicular shooting directions, and identifying the coordinates of the pictures; calculating a new state S' and a reward r obtained by taking the action; updating the Q table in the new state S'; judging whether the training times meet the requirements, if not, executing the training again; if yes, executing the next step; and if the average positioning precision of the tail end of the executing element meets the threshold requirement, obtaining the optimal parameter and storing an output Q table. The error value is measured without using other measuring equipment, so that the equipment cost is reduced; conversion calibration from the robot base to the end effector is not needed, and the calculation process is simplified.

Description

Robot kinematics parameter correction method and system based on reinforcement learning

Technical Field

The invention relates to the field of robot control, in particular to a robot kinematics parameter correction method and system based on reinforcement learning.

Background

Ophthalmic surgery generally requires highly accurate procedures to address vision disorders and other ocular diseases such as cataracts, myopia, hyperopia, and the like. Traditional ophthalmic surgery often relies on manual manipulation by the surgeon and may be affected by factors such as hand micro-tremors.

In recent years, ophthalmic surgical robotics has become an important area to address these challenges. The ophthalmic surgical robot system can provide high-precision operation, reduce surgical risks and improve surgical success rate. However, ophthalmic surgical robots are generally subject to manufacturing and assembly errors during the manufacturing process, resulting in deviations of the actual parameters of the robotic arm from nominal values. These parameter errors may lead to positioning and operating errors during surgery, thereby reducing the accuracy and success rate of the surgery.

In the prior art, different parameter calibration schemes are adopted for different surgical robots. Some surgical robots make special hardware devices for parameter calibration; some methods also use self-grinding algorithms to correct errors using low cost measurement. For the ophthalmic surgical robot, the current calibration of the ophthalmic surgical robot is mainly to continuously modify parameters in a small amplitude through manual experience so as to meet the precision requirement.

Aiming at the method, a method of manually adjusting according to experience is adopted without sensor equipment, but the time is long, and professional correction personnel are needed; the parameter calibration system for measuring the position by using the laser tracker is high in price and needs to be calibrated and calibrated regularly; the new error is introduced into the calibration of the camera device and is superposed in the forward and reverse kinematic calculations, so that the complexity of an error model is increased, and the accuracy of the kinematic parameter calibration is reduced; the method for training the neural network model to fit the kinematic parameters of the robot requires data collection, cleaning, model training and testing, and has large workload.

Disclosure of Invention

The invention provides a robot kinematic parameter correction method based on reinforcement learning, which can reduce cost and simplify operation and can correct the robot kinematic parameter in order to solve the problems of high cost, repeated calibration and complex calculation process of measuring equipment in the prior art.

In order to solve the technical problems, the invention adopts the following technical scheme: a robot kinematics parameter correction method based on reinforcement learning comprises the following steps:

step one: selecting a kinematic parameter to be modified, an initial value of the kinematic parameter, training times T, an initial state S0 and a reward function, and establishing a Q-leraning model;

Step two: randomly executing any motion for modifying the kinematic parameters according to the probability of epsilon, executing the motion for enabling the Q table to take the maximum value under the current state according to the probability of 1-epsilon, and modifying the current value of the corresponding kinematic parameters; the current value is the value of the current moment of the kinematic parameter when the second step is executed, and if the second step is executed for the first time, the current value is the initial value.

Step three: controlling the robot to execute RCM motion; remote Center of Motion (RCM): is a key point in robotic surgery for maintaining surgical tools stable. To prevent further enlargement of the patient's incision during surgery, the robot is allowed only rotational or translational movement around this point.

Step four: shooting pictures of the tail end of an execution element of the robot by using two shooting devices with mutually perpendicular shooting directions, and identifying and acquiring the tail end coordinates of the execution element; calculating a new state S' according to the coordinates before and after the robot executes the RCM movement and calculating a reward r obtained by taking the action according to a reward function;

Step five: in the new state S ', it is assumed that an action a' is performed that maximizes the Q table and updates the Q table with markov properties;

Step six: judging whether the training times reach T times or not, if not, executing the second to fifth steps again; if yes, executing a step seven;

Step seven: and comparing the current value of the kinematic parameter with the manual correction parameter, and if the average positioning precision of the tail end of the execution element meets the threshold requirement, obtaining the optimal parameter and storing an output Q table.

In the above technical solution, the end of the execution element of the robot is used as a fixed RCM point, the images shot by the two cameras are used as the basis for determining the RCM point offset, then reinforcement learning is performed, and the robot is enabled to execute corresponding kinematic parameter modification actions according to epsilon under different states according to the greedy algorithm of reinforcement learning until the final result of determining the RCM point offset meets the requirement of the threshold, that is, the end error of the execution element is smaller than the preset threshold, at this time, the optimal parameters of the kinematic parameters of the robot are obtained, and the subsequent adjustment can be performed according to the Q table.

Preferably, the Q-leraning model is specifically as follows:

State space: before the robot performs RCM motion, the image coordinates read by the two camera devices are (x 01, y 01) and (x 02, y 02), after the robot performs RCM motion, the robot re-shoots the image and identifies the tail end of the execution element, under the condition that the two camera devices shoot the image, the tail end coordinates of the execution element are (xnew 1, ynew 1) and (xnew 2, ynew 2) respectively, the image coordinates x and y before and after the RCM motion respectively make differences, the results have three results of increasing, decreasing and unchanged, and the difference result of the four parameters can generate 3×3×3=81 states, so that the states=0-80 are defined to be eighty eleven states;

Action space: setting a correction precision value and a modification mode of the kinematic parameters, wherein the modification mode is to increase or decrease the correction precision value on the basis of the current value of the kinematic parameters; each motion parameter is respectively increased, reduced or not taken as one action in the action space;

Bonus function: the minimum RCM offset in the trained process is recorded as dis_best, and when the error RCM_err of the next training is smaller than dis_best, rewards are given K is adjusted according to RCM_err under a training environment, and k is a fixed value; awarding a prize when RCM_err is greater than dis_best but less than the error of the last trainingIf the error is larger than that of the previous training, awards/>

Preferably, the RCM offset: in each step of training, setting the number of times of RCM movement, taking Euclidean distance as judgment of RCM offset, taking average offset in all RCM movement numbers as offset of the training, and defining the RCM offset as follows:

RCM_err＝(X_new1-X₀₁)²+(Y_new1-Y₀₁)²+(X_new2-X₀₂)²+(Y_new2-Y₀₂)².

Preferably, the specific values of ε are:

Where i=0, 1, 2..t, T is the number of exercises.

Preferably, in the fifth step, the Q table is updated according to the following formula:

Q(s,a)＝Q(s,a)+α(r+γQ(s′,a′)-Q(s,a))

in the method, in the process of the invention,

Preferably, in the first step, the specific process of selecting the kinematic parameters to be modified is as follows: constructing an expression of the robot execution tail end under a base coordinate system, wherein kinematic parameters appearing in the expression are used as the kinematic parameters to be changed; and manually adjusting the parameters of the kinematics to be changed, checking the parameter adjusting result, and selecting the parameters of the kinematics to be modified according to the parameter adjusting result. And determining the kinematic parameters which are most required to be corrected and can improve the RCM point offset of the robot after correction, reducing the number of the kinematic parameters which are required to be corrected and reducing the calculation amount.

Preferably, in the second step, if the error between the value of the kinematic parameter after performing the motion modification and the initial value of the corresponding kinematic parameter is greater than 5%, performing the motion modification in a reverse direction with a certain probability, so as to prevent the overfitting.

A function corresponding with a certain probability:

q is a positive number that takes a value such that the range of probability f (L _{Modifying values}-L_{initial value}) is [0,1].

In the above formula, L _{Modifying values} is the value of the modified kinematic parameter; l _{initial value} is the initial value of the kinematic parameter.

Preferably, in the fourth step, the end coordinates of the actuator are obtained by using Harris corner recognition.

Preferably, in the fifth step, the end of the actuator of the robot is initialized to the center of the images of the two cameras, so as to avoid the end from shifting the image area caused by multiple training.

A system for robot kinematic parameter correction for performing the reinforcement learning-based robot kinematic parameter correction method described above; the robot comprises a robot body, two mutually perpendicular image pick-up devices used for photographing the tail end of the robot body and a controller respectively electrically connected with the robot body and the image pick-up devices.

A computer readable storage medium storing a computer program which when executed by a processor implements the reinforcement learning-based robot kinematic parameter correction method described above.

Compared with the prior art, the invention has the beneficial effects that: the method adopts the camera to acquire the end image of the executive component, judges the RCM point offset through the image, does not need to use other measuring equipment to measure error values, and reduces equipment cost; the tail end of the execution element is used as a fixed RCM point to restrain the implementation of the method, and meanwhile, the method is combined with reinforcement learning, so that the processes of coordinate system conversion from a robot base to an end executor, calibration of internal and external parameters of a camera, hand-eye calibration and the like are not needed, and the calculation process is simplified; because the parameter calibration time and the labor cost are reduced, the repeatability of the parameter calibration is reduced, and therefore, the operation threshold of the parameter calibration is reduced.

Drawings

FIG. 1 is a flow chart of a robot kinematic parameter correction method based on reinforcement learning of the present invention;

Fig. 2 is a schematic diagram of a robot kinematic parameter correction system based on reinforcement learning according to the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent; for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent.

The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are orientations or positional relationships indicated by terms "upper", "lower", "left", "right", "long", "short", etc., based on the orientations or positional relationships shown in the drawings, this is merely for convenience in describing the present invention and simplifying the description, and is not an indication or suggestion that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, so that the terms describing the positional relationships in the drawings are merely for exemplary illustration and are not to be construed as limitations of the present patent, and that it is possible for those of ordinary skill in the art to understand the specific meaning of the terms described above according to specific circumstances.

The technical scheme of the invention is further specifically described by the following specific embodiments with reference to the accompanying drawings:

Example 1

Fig. 1 shows an embodiment of a robot kinematic parameter correction method based on reinforcement learning, which includes the following steps:

Step one: setting initial kinematic parameters, initial values of the kinematic parameters, training times T, initial states S0 and rewarding functions, and establishing a Q-leraning model;

The kinematic parameter with the largest influence on the offset of the RCM point is selected as the kinematic parameter to be modified, and the specific flow of the kinematic parameter to be modified is selected as follows: constructing an expression of the robot execution tail end under a base coordinate system, wherein kinematic parameters appearing in the expression are used as the kinematic parameters to be changed; and manually adjusting the category of the kinematic parameters to be changed, checking the parameter adjusting result, and selecting the kinematic parameters to be modified according to the parameter adjusting result. And determining the kinematic parameters which are most required to be corrected and can improve the RCM point offset of the robot after correction, reducing the number of the kinematic parameters which are required to be corrected and reducing the calculation amount.

In this embodiment, taking a robot with a parallel-coupled joint mechanism (Parallel Coupled Joint Mechanism, PCJM for short) as an example, PCJM converts differential displacement of two translational motions into one translational and one rotational motion, and each parameter means: the point A is the origin of coordinates, and L1 and L2 are the linear displacements of the movable joints at the tail ends of the upper and lower motors; the vertical distance between the two parallel motors is dm; the length of the end effector is h; the length from the midpoint of the actuator to the tail end of the actuator element is Ltool; l and θ are the angles of linear displacement and rotation of the whole PCJM, which satisfy:

from the forward kinematic analysis, PCJM can be derived from the base point a to the end point D as follows:

X_D＝cos(θ)X_A-sin(θ)Y_A+L-L_toolsin(θ)+hcos(θ) (2)

Y_D＝sin(θ)X_A+cos(θ)Y_A+L_toolcos(θ)+hsin(θ) (3)

the partial derivatives of X _D、Y_D, θ on L1, L2, L5 are written, namely the Jacobian matrix J1 is:

The series-parallel hybrid robot rotates the two PCJM end-to-end through mechanical components, directly in series, and adds a tool holder at the end. Near the base is PCJM a and near the end is PCJM a. The translational displacement and the rotation angle of the first PCJM are La and thetaa respectively, wherein La corresponds to the displacements of the upper parallel motor and the lower parallel motor and is L1 and L2; the translational displacement and rotation angle of the second PCJM are Lb, ob, lb respectively, which correspond to the upper and lower parallel motor displacements L3, L4, respectively, and a vertically movable motor is added to the tool holder, which is displaced by L5. In the overall serial-parallel robot configuration, ltool and L5 of the first PCJM are 0; the second PCJM is mounted on the end of the first PCJM after 90 ° rotation about the y-axis. From the forward kinematic derivation, the expression of the robot tip P (PX, PY, PZ) in the base coordinate system can be obtained as:

As can be seen from equation (5), when the terminal point is taken as the RCM fixed point, the error of each kinematic parameter will cause the offset, and the RCM point cannot be guaranteed to be fixed, so that the kinematic parameters need to be corrected, mainly five kinematic parameters of ltol, h1, h2, dm and dm2 are taken as the kinematic parameters to be changed, and their physical meanings are as follows:

Ltool: the distance from the end of the executing element to the central line between the upper axis and the lower axis of the PCJM parallel moving joint L3 and the L4 is the distance from the end of the executing element to the central line between the upper axis and the lower axis of the PCJM parallel moving joint L3 and the lower axis.

H1: PCJM2 to the midpoint of the ends of the two parallel-translation joints L3, L4.

H2: PCJM1 to the midpoint of the ends of the two parallel-translation joints L1, L2.

Dm: the axial distance between the two parallel moving joints L3 and L4.

Dm2: the axial distance between the two parallel-moving joints L1 and L2.

And according to the category of the kinematic parameters to be modified, manually adjusting the parameters and checking the parameter adjusting result, and according to the parameter adjusting result, selecting the kinematic parameters to be modified as Ltool, h1 and h2.

The Q-leraning model is specifically as follows:

action space: setting correction precision and modification modes of the kinematic parameters, wherein the modification modes are to increase or decrease the correction precision; each motion parameter is respectively increased, reduced or not taken as one action in the action space; the correction accuracy can be two different values, in this embodiment Ltool the correction accuracy is 0.5 mm and 1 mm, and h1, h2 the correction accuracy is 0.1 mm and 0.3 mm. Each parameter may be increased or decreased, and the modification may be an increase or decrease of the current value Ltool by 0.5/1 mm; the current values of h1 and h2 are increased or decreased by 0.1/0.3 mm, and no action is taken, so that the action space has 3×2×2+1=13 actions in total.

Bonus function: the minimum RCM offset in the trained process is recorded as dis_best, and when the error RCM_err of the next training is smaller than dis_best, rewards are givenK is adjusted according to RCM_err under a training environment, and k is a fixed value; awarding a prize when RCM_err is greater than dis_best but less than the error of the last trainingIf the error is larger than that of the previous training, awards/>

RCM offset: in each step of training, the number of times of RCM motion setting is 10, the euclidean distance is used as the judgment of RCM offset, the average offset in 10 RCM motions is used as the offset of the training, and the RCM offset is defined as:

Step two: randomly executing any motion for modifying the kinematic parameters according to the probability of epsilon, executing the motion for enabling the Q table to take the maximum value under the current state according to the probability of 1-epsilon, and modifying the current value of the corresponding kinematic parameters; if the error between the value of the kinematic parameter after the action modification is executed and the initial value of the corresponding kinematic parameter is more than 5%, the action modification is executed reversely with a certain probability, and the overfitting is prevented.

A function corresponding with a certain probability:

q is positive and takes on a value such that the value range of the probability f (L _{Modifying values}-L_{initial value}) is [0,1].

The specific values of epsilon are:

Where i=0, 1, 2..t, T is the number of exercises.

Step three: controlling the robot to execute RCM motion;

Step four: shooting a picture of the tail end of an executive component of the robot by using two shooting devices with mutually perpendicular shooting directions, and acquiring the tail end coordinates of the executive component by using Harris angular point identification; calculating a new state S' according to the coordinates before and after the robot executes the RCM movement and calculating a reward r obtained by taking the action according to a reward function;

step five: under the new state S ', the action a' for enabling the Q table to take the maximum value is supposed to be executed, the Q table is updated by utilizing the Markov property, the tail end of an executing element of the robot is initialized to the center of images of two camera devices, and the tail end is prevented from shifting an image area caused by multiple training;

In this embodiment, the Q table is updated according to the following formula:

Q(s,a)＝Q(s,a)+α(r+γQ(s′,a′)-Q(s,a))

Where s is the current state of the robot, a is the action currently taken, α and γ are decay factors, r is the rewards given by the environment after the action is taken by the robot, s ' is the state at the next moment, and a ' is the action assumed to be taken under s '.

In this embodiment, the end of the execution element of the robot is used as a fixed RCM point, the images shot by the two cameras are used as the basis for determining the RCM point offset, then reinforcement learning is performed, and the robot is enabled to execute corresponding kinematic parameter modification actions according to epsilon under different states according to the greedy algorithm of reinforcement learning until the final result of determining the RCM point offset meets the requirement of the threshold, that is, the end error of the execution element is smaller than the preset threshold, at this time, the optimal parameters of the kinematic parameters of the robot are obtained, and the subsequent adjustment can be performed according to the Q table.

The beneficial effects of this embodiment are: the method adopts the camera to acquire the end image of the executive component, judges the RCM point offset through the image, does not need to use other measuring equipment to measure error values, and reduces equipment cost; the tail end of the execution element is used as a fixed RCM point to restrain the implementation, and meanwhile, the processes of coordinate system conversion from a robot base to an end executor, calibration of internal and external parameters of a camera, hand-eye calibration and the like are not needed by combining reinforcement learning, so that the calculation process is simplified; because the parameter calibration time and the labor cost are reduced, the repeatability of the parameter calibration is reduced, and therefore, the operation threshold of the parameter calibration is reduced.

Example 2

An embodiment of a system for robot kinematic parameter correction for performing the reinforcement learning-based robot kinematic parameter correction method of embodiment 1, as shown in fig. 2, includes a robot 1, photographing devices for photographing the end of the robot and being perpendicular to each other, a first industrial camera 2 and a second industrial camera 3, respectively, and a controller electrically connected to the robot 1, the first industrial camera 2, and the second industrial camera 3, respectively.

Example 3

A computer-readable storage medium storing a computer program that, when executed by a processor, implements the reinforcement learning-based robot kinematic parameter correction method of embodiment 1.

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. The robot kinematics parameter correction method based on reinforcement learning is characterized by comprising the following steps of:

step two: randomly executing any motion for modifying the kinematic parameters according to the probability of epsilon, executing the motion for enabling the Q table to take the maximum value under the current state according to the probability of 1-epsilon, and modifying the current value of the corresponding kinematic parameters;

Step three: controlling the robot to execute RCM motion;

2. The reinforcement learning-based robot kinematic parameter correction method according to claim 1, wherein the Q-leraning model is specifically as follows:

action space: setting a correction precision value and a modification mode of the kinematic parameters, wherein the modification mode is to increase or decrease the correction precision value on the basis of the current value of the kinematic parameters; each motion parameter is respectively increased, reduced or unchanged and is set as one motion in the motion space;

Bonus function: the minimum RCM offset in the trained process is recorded as dis_best, and when the error RCM_err of the next training is smaller than dis_best, rewards are given K is adjusted according to RCM_err under a training environment, and k is a fixed value; awarding a prize/>, when RCM_err is greater than dis_best but less than the error of the last trainingIf the error is larger than that of the previous training, awards/>

3. The reinforcement learning-based robot kinematics parameter correction method of claim 2 wherein the RCM is offset: in each step of training, setting the number of times of RCM movement, taking Euclidean distance as judgment of RCM offset, taking average offset in all RCM movement numbers as offset of the training, and defining the RCM offset as follows:

4. the reinforcement learning-based robot kinematic parameter correction method according to claim 3, wherein the specific value of e is:

Where i=0, 1, 2..t, T is the number of exercises.

5. The reinforcement learning-based robot kinematics parameter correction method according to claim 3, wherein in the fifth step, the Q table is updated according to the following formula:

Q(s，a)＝Q(s，a)+α(r+γQ(s′，a′)-Q(s，a))

6. The reinforcement learning-based robot kinematic parameter correction method according to claim 1, wherein in the first step, a kinematic parameter to be corrected and an initial value of the kinematic parameter are selected, and the specific procedure is as follows: constructing an expression of the robot execution tail end under a base coordinate system, wherein kinematic parameters appearing in the expression are used as the kinematic parameters to be changed; and manually adjusting the parameters of the kinematics to be changed, checking the parameter adjusting result, and selecting the parameters of the kinematics to be modified according to the parameter adjusting result.

7. The method according to claim 1, wherein in the second step, if the error between the value of the motion-modified motion parameter and the initial value of the motion parameter is greater than 5%, the motion modification is performed in a reverse direction with a certain probability.

8. The method for correcting the kinematic parameters of the robot based on reinforcement learning according to claim 1, wherein in the fourth step, the end coordinates of the actuator are obtained by using Harris corner recognition.

9. The reinforcement learning-based robot kinematics parameter correction method according to claim 1, wherein in the fifth step, the end of the actuator of the robot is initialized to the center of the two camera images.

10. A system for robot kinematic parameter correction, characterized by performing the reinforcement learning-based robot kinematic parameter correction method of any one of claims 1-9; the robot comprises a robot body, two mutually perpendicular image pick-up devices used for photographing the tail end of the robot body and a controller respectively electrically connected with the robot body and the image pick-up devices.