CN117944049A - Robot kinematics parameter correction method and system based on reinforcement learning - Google Patents

Robot kinematics parameter correction method and system based on reinforcement learning Download PDF

Info

Publication number
CN117944049A
CN117944049A CN202410149562.6A CN202410149562A CN117944049A CN 117944049 A CN117944049 A CN 117944049A CN 202410149562 A CN202410149562 A CN 202410149562A CN 117944049 A CN117944049 A CN 117944049A
Authority
CN
China
Prior art keywords
robot
rcm
motion
parameter
kinematic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410149562.6A
Other languages
Chinese (zh)
Inventor
晏丕松
林生智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Weimou Medical Instrument Co ltd
Original Assignee
Guangzhou Weimou Medical Instrument Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Weimou Medical Instrument Co ltd filed Critical Guangzhou Weimou Medical Instrument Co ltd
Priority to CN202410149562.6A priority Critical patent/CN117944049A/en
Publication of CN117944049A publication Critical patent/CN117944049A/en
Pending legal-status Critical Current

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/0095Means or methods for testing manipulators
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Manipulator (AREA)

Abstract

The invention relates to a robot kinematics parameter correction method based on reinforcement learning, which comprises the following steps: establishing a Q-leraning model; randomly executing any motion for modifying the kinematic parameters according to the probability of epsilon, executing the motion for enabling the Q table to take the maximum value under the current state according to the probability of 1-epsilon, and modifying the corresponding kinematic parameters; controlling the robot to execute RCM motion; shooting pictures of the tail end of an execution element of the robot by using two shooting devices with mutually perpendicular shooting directions, and identifying the coordinates of the pictures; calculating a new state S' and a reward r obtained by taking the action; updating the Q table in the new state S'; judging whether the training times meet the requirements, if not, executing the training again; if yes, executing the next step; and if the average positioning precision of the tail end of the executing element meets the threshold requirement, obtaining the optimal parameter and storing an output Q table. The error value is measured without using other measuring equipment, so that the equipment cost is reduced; conversion calibration from the robot base to the end effector is not needed, and the calculation process is simplified.

Description

Robot kinematics parameter correction method and system based on reinforcement learning
Technical Field
The invention relates to the field of robot control, in particular to a robot kinematics parameter correction method and system based on reinforcement learning.
Background
Ophthalmic surgery generally requires highly accurate procedures to address vision disorders and other ocular diseases such as cataracts, myopia, hyperopia, and the like. Traditional ophthalmic surgery often relies on manual manipulation by the surgeon and may be affected by factors such as hand micro-tremors.
In recent years, ophthalmic surgical robotics has become an important area to address these challenges. The ophthalmic surgical robot system can provide high-precision operation, reduce surgical risks and improve surgical success rate. However, ophthalmic surgical robots are generally subject to manufacturing and assembly errors during the manufacturing process, resulting in deviations of the actual parameters of the robotic arm from nominal values. These parameter errors may lead to positioning and operating errors during surgery, thereby reducing the accuracy and success rate of the surgery.
In the prior art, different parameter calibration schemes are adopted for different surgical robots. Some surgical robots make special hardware devices for parameter calibration; some methods also use self-grinding algorithms to correct errors using low cost measurement. For the ophthalmic surgical robot, the current calibration of the ophthalmic surgical robot is mainly to continuously modify parameters in a small amplitude through manual experience so as to meet the precision requirement.
Aiming at the method, a method of manually adjusting according to experience is adopted without sensor equipment, but the time is long, and professional correction personnel are needed; the parameter calibration system for measuring the position by using the laser tracker is high in price and needs to be calibrated and calibrated regularly; the new error is introduced into the calibration of the camera device and is superposed in the forward and reverse kinematic calculations, so that the complexity of an error model is increased, and the accuracy of the kinematic parameter calibration is reduced; the method for training the neural network model to fit the kinematic parameters of the robot requires data collection, cleaning, model training and testing, and has large workload.
Disclosure of Invention
The invention provides a robot kinematic parameter correction method based on reinforcement learning, which can reduce cost and simplify operation and can correct the robot kinematic parameter in order to solve the problems of high cost, repeated calibration and complex calculation process of measuring equipment in the prior art.
In order to solve the technical problems, the invention adopts the following technical scheme: a robot kinematics parameter correction method based on reinforcement learning comprises the following steps:
step one: selecting a kinematic parameter to be modified, an initial value of the kinematic parameter, training times T, an initial state S0 and a reward function, and establishing a Q-leraning model;
Step two: randomly executing any motion for modifying the kinematic parameters according to the probability of epsilon, executing the motion for enabling the Q table to take the maximum value under the current state according to the probability of 1-epsilon, and modifying the current value of the corresponding kinematic parameters; the current value is the value of the current moment of the kinematic parameter when the second step is executed, and if the second step is executed for the first time, the current value is the initial value.
Step three: controlling the robot to execute RCM motion; remote Center of Motion (RCM): is a key point in robotic surgery for maintaining surgical tools stable. To prevent further enlargement of the patient's incision during surgery, the robot is allowed only rotational or translational movement around this point.
Step four: shooting pictures of the tail end of an execution element of the robot by using two shooting devices with mutually perpendicular shooting directions, and identifying and acquiring the tail end coordinates of the execution element; calculating a new state S' according to the coordinates before and after the robot executes the RCM movement and calculating a reward r obtained by taking the action according to a reward function;
Step five: in the new state S ', it is assumed that an action a' is performed that maximizes the Q table and updates the Q table with markov properties;
Step six: judging whether the training times reach T times or not, if not, executing the second to fifth steps again; if yes, executing a step seven;
Step seven: and comparing the current value of the kinematic parameter with the manual correction parameter, and if the average positioning precision of the tail end of the execution element meets the threshold requirement, obtaining the optimal parameter and storing an output Q table.
In the above technical solution, the end of the execution element of the robot is used as a fixed RCM point, the images shot by the two cameras are used as the basis for determining the RCM point offset, then reinforcement learning is performed, and the robot is enabled to execute corresponding kinematic parameter modification actions according to epsilon under different states according to the greedy algorithm of reinforcement learning until the final result of determining the RCM point offset meets the requirement of the threshold, that is, the end error of the execution element is smaller than the preset threshold, at this time, the optimal parameters of the kinematic parameters of the robot are obtained, and the subsequent adjustment can be performed according to the Q table.
Preferably, the Q-leraning model is specifically as follows:
State space: before the robot performs RCM motion, the image coordinates read by the two camera devices are (x 01, y 01) and (x 02, y 02), after the robot performs RCM motion, the robot re-shoots the image and identifies the tail end of the execution element, under the condition that the two camera devices shoot the image, the tail end coordinates of the execution element are (xnew 1, ynew 1) and (xnew 2, ynew 2) respectively, the image coordinates x and y before and after the RCM motion respectively make differences, the results have three results of increasing, decreasing and unchanged, and the difference result of the four parameters can generate 3×3×3=81 states, so that the states=0-80 are defined to be eighty eleven states;
Action space: setting a correction precision value and a modification mode of the kinematic parameters, wherein the modification mode is to increase or decrease the correction precision value on the basis of the current value of the kinematic parameters; each motion parameter is respectively increased, reduced or not taken as one action in the action space;
Bonus function: the minimum RCM offset in the trained process is recorded as dis_best, and when the error RCM_err of the next training is smaller than dis_best, rewards are given K is adjusted according to RCM_err under a training environment, and k is a fixed value; awarding a prize when RCM_err is greater than dis_best but less than the error of the last trainingIf the error is larger than that of the previous training, awards/>
Preferably, the RCM offset: in each step of training, setting the number of times of RCM movement, taking Euclidean distance as judgment of RCM offset, taking average offset in all RCM movement numbers as offset of the training, and defining the RCM offset as follows:
RCM_err=(Xnew1-X01)2+(Ynew1-Y01)2+(Xnew2-X02)2+(Ynew2-Y02)2.
Preferably, the specific values of ε are:
Where i=0, 1, 2..t, T is the number of exercises.
Preferably, in the fifth step, the Q table is updated according to the following formula:
Q(s,a)=Q(s,a)+α(r+γQ(s′,a′)-Q(s,a))
in the method, in the process of the invention,
Preferably, in the first step, the specific process of selecting the kinematic parameters to be modified is as follows: constructing an expression of the robot execution tail end under a base coordinate system, wherein kinematic parameters appearing in the expression are used as the kinematic parameters to be changed; and manually adjusting the parameters of the kinematics to be changed, checking the parameter adjusting result, and selecting the parameters of the kinematics to be modified according to the parameter adjusting result. And determining the kinematic parameters which are most required to be corrected and can improve the RCM point offset of the robot after correction, reducing the number of the kinematic parameters which are required to be corrected and reducing the calculation amount.
Preferably, in the second step, if the error between the value of the kinematic parameter after performing the motion modification and the initial value of the corresponding kinematic parameter is greater than 5%, performing the motion modification in a reverse direction with a certain probability, so as to prevent the overfitting.
A function corresponding with a certain probability:
q is a positive number that takes a value such that the range of probability f (L Modifying values -L initial value ) is [0,1].
In the above formula, L Modifying values is the value of the modified kinematic parameter; l initial value is the initial value of the kinematic parameter.
Preferably, in the fourth step, the end coordinates of the actuator are obtained by using Harris corner recognition.
Preferably, in the fifth step, the end of the actuator of the robot is initialized to the center of the images of the two cameras, so as to avoid the end from shifting the image area caused by multiple training.
A system for robot kinematic parameter correction for performing the reinforcement learning-based robot kinematic parameter correction method described above; the robot comprises a robot body, two mutually perpendicular image pick-up devices used for photographing the tail end of the robot body and a controller respectively electrically connected with the robot body and the image pick-up devices.
A computer readable storage medium storing a computer program which when executed by a processor implements the reinforcement learning-based robot kinematic parameter correction method described above.
Compared with the prior art, the invention has the beneficial effects that: the method adopts the camera to acquire the end image of the executive component, judges the RCM point offset through the image, does not need to use other measuring equipment to measure error values, and reduces equipment cost; the tail end of the execution element is used as a fixed RCM point to restrain the implementation of the method, and meanwhile, the method is combined with reinforcement learning, so that the processes of coordinate system conversion from a robot base to an end executor, calibration of internal and external parameters of a camera, hand-eye calibration and the like are not needed, and the calculation process is simplified; because the parameter calibration time and the labor cost are reduced, the repeatability of the parameter calibration is reduced, and therefore, the operation threshold of the parameter calibration is reduced.
Drawings
FIG. 1 is a flow chart of a robot kinematic parameter correction method based on reinforcement learning of the present invention;
Fig. 2 is a schematic diagram of a robot kinematic parameter correction system based on reinforcement learning according to the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent; for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are orientations or positional relationships indicated by terms "upper", "lower", "left", "right", "long", "short", etc., based on the orientations or positional relationships shown in the drawings, this is merely for convenience in describing the present invention and simplifying the description, and is not an indication or suggestion that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, so that the terms describing the positional relationships in the drawings are merely for exemplary illustration and are not to be construed as limitations of the present patent, and that it is possible for those of ordinary skill in the art to understand the specific meaning of the terms described above according to specific circumstances.
The technical scheme of the invention is further specifically described by the following specific embodiments with reference to the accompanying drawings:
Example 1
Fig. 1 shows an embodiment of a robot kinematic parameter correction method based on reinforcement learning, which includes the following steps:
Step one: setting initial kinematic parameters, initial values of the kinematic parameters, training times T, initial states S0 and rewarding functions, and establishing a Q-leraning model;
The kinematic parameter with the largest influence on the offset of the RCM point is selected as the kinematic parameter to be modified, and the specific flow of the kinematic parameter to be modified is selected as follows: constructing an expression of the robot execution tail end under a base coordinate system, wherein kinematic parameters appearing in the expression are used as the kinematic parameters to be changed; and manually adjusting the category of the kinematic parameters to be changed, checking the parameter adjusting result, and selecting the kinematic parameters to be modified according to the parameter adjusting result. And determining the kinematic parameters which are most required to be corrected and can improve the RCM point offset of the robot after correction, reducing the number of the kinematic parameters which are required to be corrected and reducing the calculation amount.
In this embodiment, taking a robot with a parallel-coupled joint mechanism (Parallel Coupled Joint Mechanism, PCJM for short) as an example, PCJM converts differential displacement of two translational motions into one translational and one rotational motion, and each parameter means: the point A is the origin of coordinates, and L1 and L2 are the linear displacements of the movable joints at the tail ends of the upper and lower motors; the vertical distance between the two parallel motors is dm; the length of the end effector is h; the length from the midpoint of the actuator to the tail end of the actuator element is Ltool; l and θ are the angles of linear displacement and rotation of the whole PCJM, which satisfy:
from the forward kinematic analysis, PCJM can be derived from the base point a to the end point D as follows:
XD=cos(θ)XA-sin(θ)YA+L-Ltoolsin(θ)+hcos(θ) (2)
YD=sin(θ)XA+cos(θ)YA+Ltoolcos(θ)+hsin(θ) (3)
the partial derivatives of X D、YD, θ on L1, L2, L5 are written, namely the Jacobian matrix J1 is:
The series-parallel hybrid robot rotates the two PCJM end-to-end through mechanical components, directly in series, and adds a tool holder at the end. Near the base is PCJM a and near the end is PCJM a. The translational displacement and the rotation angle of the first PCJM are La and thetaa respectively, wherein La corresponds to the displacements of the upper parallel motor and the lower parallel motor and is L1 and L2; the translational displacement and rotation angle of the second PCJM are Lb, ob, lb respectively, which correspond to the upper and lower parallel motor displacements L3, L4, respectively, and a vertically movable motor is added to the tool holder, which is displaced by L5. In the overall serial-parallel robot configuration, ltool and L5 of the first PCJM are 0; the second PCJM is mounted on the end of the first PCJM after 90 ° rotation about the y-axis. From the forward kinematic derivation, the expression of the robot tip P (PX, PY, PZ) in the base coordinate system can be obtained as:
As can be seen from equation (5), when the terminal point is taken as the RCM fixed point, the error of each kinematic parameter will cause the offset, and the RCM point cannot be guaranteed to be fixed, so that the kinematic parameters need to be corrected, mainly five kinematic parameters of ltol, h1, h2, dm and dm2 are taken as the kinematic parameters to be changed, and their physical meanings are as follows:
Ltool: the distance from the end of the executing element to the central line between the upper axis and the lower axis of the PCJM parallel moving joint L3 and the L4 is the distance from the end of the executing element to the central line between the upper axis and the lower axis of the PCJM parallel moving joint L3 and the lower axis.
H1: PCJM2 to the midpoint of the ends of the two parallel-translation joints L3, L4.
H2: PCJM1 to the midpoint of the ends of the two parallel-translation joints L1, L2.
Dm: the axial distance between the two parallel moving joints L3 and L4.
Dm2: the axial distance between the two parallel-moving joints L1 and L2.
And according to the category of the kinematic parameters to be modified, manually adjusting the parameters and checking the parameter adjusting result, and according to the parameter adjusting result, selecting the kinematic parameters to be modified as Ltool, h1 and h2.
The Q-leraning model is specifically as follows:
State space: before the robot performs RCM motion, the image coordinates read by the two camera devices are (x 01, y 01) and (x 02, y 02), after the robot performs RCM motion, the robot re-shoots the image and identifies the tail end of the execution element, under the condition that the two camera devices shoot the image, the tail end coordinates of the execution element are (xnew 1, ynew 1) and (xnew 2, ynew 2) respectively, the image coordinates x and y before and after the RCM motion respectively make differences, the results have three results of increasing, decreasing and unchanged, and the difference result of the four parameters can generate 3×3×3=81 states, so that the states=0-80 are defined to be eighty eleven states;
action space: setting correction precision and modification modes of the kinematic parameters, wherein the modification modes are to increase or decrease the correction precision; each motion parameter is respectively increased, reduced or not taken as one action in the action space; the correction accuracy can be two different values, in this embodiment Ltool the correction accuracy is 0.5 mm and 1 mm, and h1, h2 the correction accuracy is 0.1 mm and 0.3 mm. Each parameter may be increased or decreased, and the modification may be an increase or decrease of the current value Ltool by 0.5/1 mm; the current values of h1 and h2 are increased or decreased by 0.1/0.3 mm, and no action is taken, so that the action space has 3×2×2+1=13 actions in total.
Bonus function: the minimum RCM offset in the trained process is recorded as dis_best, and when the error RCM_err of the next training is smaller than dis_best, rewards are givenK is adjusted according to RCM_err under a training environment, and k is a fixed value; awarding a prize when RCM_err is greater than dis_best but less than the error of the last trainingIf the error is larger than that of the previous training, awards/>
RCM offset: in each step of training, the number of times of RCM motion setting is 10, the euclidean distance is used as the judgment of RCM offset, the average offset in 10 RCM motions is used as the offset of the training, and the RCM offset is defined as:
RCM_err=(Xnew1-X01)2+(Ynew1-Y01)2+(Xnew2-X02)2+(Ynew2-Y02)2.
Step two: randomly executing any motion for modifying the kinematic parameters according to the probability of epsilon, executing the motion for enabling the Q table to take the maximum value under the current state according to the probability of 1-epsilon, and modifying the current value of the corresponding kinematic parameters; if the error between the value of the kinematic parameter after the action modification is executed and the initial value of the corresponding kinematic parameter is more than 5%, the action modification is executed reversely with a certain probability, and the overfitting is prevented.
A function corresponding with a certain probability:
q is positive and takes on a value such that the value range of the probability f (L Modifying values -L initial value ) is [0,1].
In the above formula, L Modifying values is the value of the modified kinematic parameter; l initial value is the initial value of the kinematic parameter.
The specific values of epsilon are:
Where i=0, 1, 2..t, T is the number of exercises.
Step three: controlling the robot to execute RCM motion;
Step four: shooting a picture of the tail end of an executive component of the robot by using two shooting devices with mutually perpendicular shooting directions, and acquiring the tail end coordinates of the executive component by using Harris angular point identification; calculating a new state S' according to the coordinates before and after the robot executes the RCM movement and calculating a reward r obtained by taking the action according to a reward function;
step five: under the new state S ', the action a' for enabling the Q table to take the maximum value is supposed to be executed, the Q table is updated by utilizing the Markov property, the tail end of an executing element of the robot is initialized to the center of images of two camera devices, and the tail end is prevented from shifting an image area caused by multiple training;
In this embodiment, the Q table is updated according to the following formula:
Q(s,a)=Q(s,a)+α(r+γQ(s′,a′)-Q(s,a))
Where s is the current state of the robot, a is the action currently taken, α and γ are decay factors, r is the rewards given by the environment after the action is taken by the robot, s ' is the state at the next moment, and a ' is the action assumed to be taken under s '.
Step six: judging whether the training times reach T times or not, if not, executing the second to fifth steps again; if yes, executing a step seven;
Step seven: and comparing the current value of the kinematic parameter with the manual correction parameter, and if the average positioning precision of the tail end of the execution element meets the threshold requirement, obtaining the optimal parameter and storing an output Q table.
In this embodiment, the end of the execution element of the robot is used as a fixed RCM point, the images shot by the two cameras are used as the basis for determining the RCM point offset, then reinforcement learning is performed, and the robot is enabled to execute corresponding kinematic parameter modification actions according to epsilon under different states according to the greedy algorithm of reinforcement learning until the final result of determining the RCM point offset meets the requirement of the threshold, that is, the end error of the execution element is smaller than the preset threshold, at this time, the optimal parameters of the kinematic parameters of the robot are obtained, and the subsequent adjustment can be performed according to the Q table.
The beneficial effects of this embodiment are: the method adopts the camera to acquire the end image of the executive component, judges the RCM point offset through the image, does not need to use other measuring equipment to measure error values, and reduces equipment cost; the tail end of the execution element is used as a fixed RCM point to restrain the implementation, and meanwhile, the processes of coordinate system conversion from a robot base to an end executor, calibration of internal and external parameters of a camera, hand-eye calibration and the like are not needed by combining reinforcement learning, so that the calculation process is simplified; because the parameter calibration time and the labor cost are reduced, the repeatability of the parameter calibration is reduced, and therefore, the operation threshold of the parameter calibration is reduced.
Example 2
An embodiment of a system for robot kinematic parameter correction for performing the reinforcement learning-based robot kinematic parameter correction method of embodiment 1, as shown in fig. 2, includes a robot 1, photographing devices for photographing the end of the robot and being perpendicular to each other, a first industrial camera 2 and a second industrial camera 3, respectively, and a controller electrically connected to the robot 1, the first industrial camera 2, and the second industrial camera 3, respectively.
Example 3
A computer-readable storage medium storing a computer program that, when executed by a processor, implements the reinforcement learning-based robot kinematic parameter correction method of embodiment 1.
It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (10)

1. The robot kinematics parameter correction method based on reinforcement learning is characterized by comprising the following steps of:
step one: selecting a kinematic parameter to be modified, an initial value of the kinematic parameter, training times T, an initial state S0 and a reward function, and establishing a Q-leraning model;
step two: randomly executing any motion for modifying the kinematic parameters according to the probability of epsilon, executing the motion for enabling the Q table to take the maximum value under the current state according to the probability of 1-epsilon, and modifying the current value of the corresponding kinematic parameters;
Step three: controlling the robot to execute RCM motion;
Step four: shooting pictures of the tail end of an execution element of the robot by using two shooting devices with mutually perpendicular shooting directions, and identifying and acquiring the tail end coordinates of the execution element; calculating a new state S' according to the coordinates before and after the robot executes the RCM movement and calculating a reward r obtained by taking the action according to a reward function;
Step five: in the new state S ', it is assumed that an action a' is performed that maximizes the Q table and updates the Q table with markov properties;
Step six: judging whether the training times reach T times or not, if not, executing the second to fifth steps again; if yes, executing a step seven;
Step seven: and comparing the current value of the kinematic parameter with the manual correction parameter, and if the average positioning precision of the tail end of the execution element meets the threshold requirement, obtaining the optimal parameter and storing an output Q table.
2. The reinforcement learning-based robot kinematic parameter correction method according to claim 1, wherein the Q-leraning model is specifically as follows:
State space: before the robot performs RCM motion, the image coordinates read by the two camera devices are (x 01, y 01) and (x 02, y 02), after the robot performs RCM motion, the robot re-shoots the image and identifies the tail end of the execution element, under the condition that the two camera devices shoot the image, the tail end coordinates of the execution element are (xnew 1, ynew 1) and (xnew 2, ynew 2) respectively, the image coordinates x and y before and after the RCM motion respectively make differences, the results have three results of increasing, decreasing and unchanged, and the difference result of the four parameters can generate 3×3×3=81 states, so that the states=0-80 are defined to be eighty eleven states;
action space: setting a correction precision value and a modification mode of the kinematic parameters, wherein the modification mode is to increase or decrease the correction precision value on the basis of the current value of the kinematic parameters; each motion parameter is respectively increased, reduced or unchanged and is set as one motion in the motion space;
Bonus function: the minimum RCM offset in the trained process is recorded as dis_best, and when the error RCM_err of the next training is smaller than dis_best, rewards are given K is adjusted according to RCM_err under a training environment, and k is a fixed value; awarding a prize/>, when RCM_err is greater than dis_best but less than the error of the last trainingIf the error is larger than that of the previous training, awards/>
3. The reinforcement learning-based robot kinematics parameter correction method of claim 2 wherein the RCM is offset: in each step of training, setting the number of times of RCM movement, taking Euclidean distance as judgment of RCM offset, taking average offset in all RCM movement numbers as offset of the training, and defining the RCM offset as follows:
RCM_err=(Xnew1-X01)2+(Ynew1-Y01)2+(Xnew2-X02)2+(Ynew2-Y02)2.
4. the reinforcement learning-based robot kinematic parameter correction method according to claim 3, wherein the specific value of e is:
Where i=0, 1, 2..t, T is the number of exercises.
5. The reinforcement learning-based robot kinematics parameter correction method according to claim 3, wherein in the fifth step, the Q table is updated according to the following formula:
Q(s,a)=Q(s,a)+α(r+γQ(s′,a′)-Q(s,a))
Where s is the current state of the robot, a is the action currently taken, α and γ are decay factors, r is the rewards given by the environment after the action is taken by the robot, s ' is the state at the next moment, and a ' is the action assumed to be taken under s '.
6. The reinforcement learning-based robot kinematic parameter correction method according to claim 1, wherein in the first step, a kinematic parameter to be corrected and an initial value of the kinematic parameter are selected, and the specific procedure is as follows: constructing an expression of the robot execution tail end under a base coordinate system, wherein kinematic parameters appearing in the expression are used as the kinematic parameters to be changed; and manually adjusting the parameters of the kinematics to be changed, checking the parameter adjusting result, and selecting the parameters of the kinematics to be modified according to the parameter adjusting result.
7. The method according to claim 1, wherein in the second step, if the error between the value of the motion-modified motion parameter and the initial value of the motion parameter is greater than 5%, the motion modification is performed in a reverse direction with a certain probability.
8. The method for correcting the kinematic parameters of the robot based on reinforcement learning according to claim 1, wherein in the fourth step, the end coordinates of the actuator are obtained by using Harris corner recognition.
9. The reinforcement learning-based robot kinematics parameter correction method according to claim 1, wherein in the fifth step, the end of the actuator of the robot is initialized to the center of the two camera images.
10. A system for robot kinematic parameter correction, characterized by performing the reinforcement learning-based robot kinematic parameter correction method of any one of claims 1-9; the robot comprises a robot body, two mutually perpendicular image pick-up devices used for photographing the tail end of the robot body and a controller respectively electrically connected with the robot body and the image pick-up devices.
CN202410149562.6A 2024-02-01 2024-02-01 Robot kinematics parameter correction method and system based on reinforcement learning Pending CN117944049A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410149562.6A CN117944049A (en) 2024-02-01 2024-02-01 Robot kinematics parameter correction method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410149562.6A CN117944049A (en) 2024-02-01 2024-02-01 Robot kinematics parameter correction method and system based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN117944049A true CN117944049A (en) 2024-04-30

Family

ID=90804847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410149562.6A Pending CN117944049A (en) 2024-02-01 2024-02-01 Robot kinematics parameter correction method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN117944049A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118238148A (en) * 2024-05-13 2024-06-25 北京迁移科技有限公司 Hand-eye calibration system, method, readable storage medium and computer program product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118238148A (en) * 2024-05-13 2024-06-25 北京迁移科技有限公司 Hand-eye calibration system, method, readable storage medium and computer program product

Similar Documents

Publication Publication Date Title
JP7237483B2 (en) Robot system control method, control program, recording medium, control device, robot system, article manufacturing method
CN110919658B (en) Robot calibration method based on vision and multi-coordinate system closed-loop conversion
CN111801198B (en) Hand-eye calibration method, system and computer storage medium
JP6429473B2 (en) Robot system, robot system calibration method, program, and computer-readable recording medium
CN113910219A (en) Exercise arm system and control method
CN111775146A (en) Visual alignment method under industrial mechanical arm multi-station operation
CN113143461B (en) Man-machine cooperative minimally invasive endoscope holding robot system
CN113715029B (en) Autonomous learning method of mechanical arm control method
CN114886567A (en) Method for calibrating hands and eyes of surgical robot with telecentric motionless point constraint
CN115446847A (en) System and method for improving 3D eye-hand coordination accuracy of a robotic system
CN113940812B (en) Cornea center positioning method for excimer laser cornea refractive surgery
CN110505383B (en) Image acquisition method, image acquisition device and endoscope system
Wang et al. Design of a novel flexible robotic laparoscope using a two degrees-of-freedom cable-driven continuum mechanism with major arc notches
EP4289379A1 (en) Method and device for planning initial poses of surgical arms of laparoscopic surgical robot
CN117944049A (en) Robot kinematics parameter correction method and system based on reinforcement learning
CN114452004B (en) Control method for tail end position and posture of surgical robot
CN113742992A (en) Master-slave control method based on deep learning and application
CN110900608B (en) Robot kinematics calibration method based on optimal measurement configuration selection
CN116831729A (en) Instrument prompting method and system under surgical robot endoscope vision
CN116394254A (en) Zero calibration method and device for robot and computer storage medium
CN115813556A (en) Surgical robot calibration method and device, surgical robot and storage medium
CN112847441B (en) Six-axis robot coordinate offset detection method and device based on gradient descent method
CN115597534A (en) Robot compliant cutter calibration method based on articulated measuring arm
CN114521960B (en) Full-automatic real-time calibration method, device and system of abdominal cavity operation robot
CN114589682A (en) Iteration method for automatic calibration of robot hand and eye

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination