CN117464683A - Method for controlling mechanical arm to simulate video motion - Google Patents
Method for controlling mechanical arm to simulate video motion Download PDFInfo
- Publication number
- CN117464683A CN117464683A CN202311571826.9A CN202311571826A CN117464683A CN 117464683 A CN117464683 A CN 117464683A CN 202311571826 A CN202311571826 A CN 202311571826A CN 117464683 A CN117464683 A CN 117464683A
- Authority
- CN
- China
- Prior art keywords
- key feature
- coordinate system
- feature points
- target object
- pixel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000013507 mapping Methods 0.000 claims abstract description 20
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 10
- 238000006243 chemical reaction Methods 0.000 claims description 17
- 239000011159 matrix material Substances 0.000 claims description 15
- 230000009466 transformation Effects 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 7
- 230000003287 optical effect Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 6
- 229910002056 binary alloy Inorganic materials 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000005286 illumination Methods 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1669—Programme controls characterised by programming, planning systems for manipulators characterised by special application, e.g. multi-arm co-operation, assembly, grasping
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1664—Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
Abstract
A method for controlling a mechanical arm to simulate video actions, which relates to the mechanical arm and a camera; the mechanical arm comprises a base, an arm section and a clamping jaw which are sequentially connected, and the clamping jaw is used for clamping a target object; the method comprises the following steps: 1, acquiring the position of a target object in a camera coordinate system; 2, acquiring the pixel position of the target object on the image; extracting key feature points of the target object from the image; acquiring a data set and training a neural network; 5, mapping the gesture of the target object in the demonstration video to a real space; and 6, mapping the space coordinates of the target object to a base coordinate system. The invention realizes the feature recognition of the target object clamped by the mechanical arm in the video, accurately perceives the real three-dimensional space operation track of the target object, and further controls the stable copying/restoring operation of the mechanical arm.
Description
Technical Field
The invention relates to the technical field of robot automatic control, in particular to a method for controlling a mechanical arm to simulate video actions.
Background
With the development of computer and robotics, traditional manual assembly is being gradually replaced by industrial robotic assembly. The wide application of industrial robots greatly improves assembly efficiency and reliability. At present, on an industrial production line, a teaching programming or off-line programming method is mainly used for ensuring that a robot automatically completes an assembly task. However, when different types of assembly tasks need to be handled, the programming effort is significant. In addition, due to the low level of human-machine cooperation, conventional robot programming methods are difficult to apply to complex assembly scenarios.
Man-machine cooperation assembly based on artificial intelligence is actively developed, which further improves assembly efficiency, quality, flexibility and sustainability, and can be suitable for various future assembly scenes. In human-computer collaboration based on artificial intelligence, vision is an important way for robots to obtain environmental information. However, it is difficult for the robot to perform feature recognition on an unknown object from a video, and in particular, it is difficult to accurately perceive a real three-dimensional space operation trajectory of a target object from the video and stably reproduce the operations.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a method for controlling a mechanical arm to simulate video actions, which realizes feature recognition on unknown objects in video and accurate perception of real three-dimensional space operation tracks of target objects, thereby stably copying the operations.
The technical scheme of the invention is as follows: a method for controlling a mechanical arm to simulate video actions, which relates to the mechanical arm and a camera; the mechanical arm comprises a base, an arm section and a clamping jaw which are sequentially connected, and the clamping jaw is used for clamping a target object; the target object is marked as P; the camera is a depth camera;
the method comprises the following steps:
s01, acquiring the position of a target object in a camera coordinate system:
calibrating a hand-eye system, establishing a coordinate conversion relation between the base and the camera, and determining a base coordinate system and a camera sitting position by utilizing the relationConversion matrix between labelsSince the pose of the gripper in the base coordinate system can be derived from the state of the robot arm, it is assumed that the target object is at the gripper center point P in the base coordinate system Rob :(x Rob ,y Rob ,z Rob ) Then calculate the center point P of the clamping jaw in the basic coordinate system based on the formula 1 Rob Position P in camera coordinate system Dcam ;
Equation 1:
s02, acquiring pixel positions of a target object on an image:
after the camera is subjected to internal reference calibration, an internal reference Matrix shown in a formula 2 is obtained in ;
Equation 2:wherein f x C for the focal length of the camera along the X-axis of the captured image x To capture the optical center point of the X-axis of the image along the camera, f y C for the focal length of the camera along the Y-axis of the captured image y An optical center point along a Y-axis of a camera-captured image;
position P of target object from camera coordinate system Dcam Conversion to position P in a pixel coordinate System pixel The conversion process is based on equation 3;
equation 3: p (P) pixel =Matrix in ·P Dcam Wherein P is pixel Simply called a target object pixel coordinate point;
coordinate point P of target object pixel pixel Normalizing to obtain the pixel position P of the target object image As shown in equation 4;
equation 4:wherein, [0 ]]Index indicating first coordinate value, [1 ]]Index representing the second coordinate value, [2 ]]Index indicating the third coordinate value, P pixel [0]Representing the X-axis coordinate value, P, of the target object in the pixel coordinate system pixel [1]Representing Y-axis coordinate values, P, of a target object in a pixel coordinate system pixel [2]A Z-axis coordinate value representing the target object in a pixel coordinate system;
s03, extracting key feature points of a target object from the image:
firstly, establishing an arm segment end central point S to a target object pixel position P on a shooting image image Screening out the point on the forward side of the extension line vector, defining a central area according to the position of the target area, deleting the pixel points outside the central area, and screening out the target area according to the depth value range set by the target scene; extracting 3 key feature points of a target object from a target area to obtain positions of the 3 key feature points in a pixel coordinate system, wherein the 3 key feature points have relatively higher stability and discrimination degree compared with other common feature points which are not screened, and can be detected again in images with different visual angles and different illumination conditions shot by a camera;
in the step, the target area is an area where the target object is located;
s04, acquiring a data set and training a neural network:
recording the positions of the 3 key feature points in a pixel coordinate system and the positions of the 3 key feature points in a camera coordinate system, and then performing random operation on a target object by the mechanical arm, wherein the random operation comprises rotation and translation; the following two operations are performed out of order: i, through conversion matrixCalculating new positions of the clamping jaw center point and the 3 key feature points in a camera coordinate system after the operation is executed; II, acquiring a newly generated target area after operation; repeating the random operation for a plurality of times, and recording the positions of the 3 key feature points and the target area before and after each operation to form a data set for training a subsequent Keypoint-RCNN neural network; collectingAfter training is completed, each time the mechanical arm clamping jaw in the demonstration video is observed to clamp any target object, the Keypoint-RCNN neural network can automatically acquire 3 key feature points, and the 3 key feature points are tracked and the positions of the 3 key feature points in a pixel coordinate system are predicted along with the playing of the demonstration video;
s05, mapping the gesture of the target object in the demonstration video to a real space:
preparing a demonstration video which needs to be imitated by a mechanical arm in advance, and clamping any article by a clamping jaw of the mechanical arm in the demonstration video to execute operation; the key point-RCNN neural network automatically acquires 3 key feature points from the demonstration video, tracks the 3 key feature points along with the playing of the demonstration video and predicts the positions of the 3 key feature points in a pixel coordinate system;
firstly, measuring the angle of view of a built-in parameter of a camera and demonstrating the actual length among key feature points in a first frame of a video as priori knowledge;
then, when the camera is horizontally placed, a space coordinate system is established, in the space coordinate system, the origin is the center of the camera lens, the x-axis is arranged along the horizontal direction and is perpendicular to the axial center line of the camera lens, the y-axis is arranged along the vertical direction, and the z-axis is directed to the direction of the target object along the lens; finally, mapping coordinates of 3 key feature points automatically acquired by the Keypoint-RCNN neural network in an image coordinate system to coordinates in a space coordinate system;
s06, mapping the space coordinates of the target object to a base coordinate system:
passing spatial coordinates of 3 key feature points in all frames of the demonstration video through a conversion matrixMapping to a base coordinate system to obtain a formula 10;
equation 10:wherein (1)>Spatial coordinates for 3 key feature points in all frames, +.>Base coordinates for 3 key feature points in all frames; the target position to be reached by the mechanical arm can be calculated based on the base coordinates, and then a control command is output to control the mechanical arm to act so as to reach the target position.
The invention further adopts the technical scheme that: in the step S01, the hand-eye system is a binary system consisting of a mechanical arm and a depth camera, the base coordinate system is based on a base, and the center point of the clamping jaw is the center of a minimum sphere capable of completely containing the clamping jaw.
The invention further adopts the technical scheme that: in step S05, the mapping process is as follows:
the coordinates of the key feature points in the image coordinate system are (P cx ,P cy ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, P cx Representing the pixel coordinates of the c-th key feature point in the x direction, P cy Representing pixel coordinates of the c-th key feature point in the y direction; the coordinates of the key feature points in the real space are (x c ,y c ,z c ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, x c Representing the spatial coordinates of the c-th key feature point on the x-axis, y c Representing the spatial coordinates of the c-th key feature point in the y direction, z c Representing the spatial coordinates of the c-th key feature point in the z direction;
the effective focal length EFL is calculated with reference to equation 7;
equation 7:wherein Fov is the camera built-in parameter field angle, and H is the pixel height of the shot image;
the transformation mapping P of key feature points from the image coordinate system to the spatial coordinate system c See formula 8;
equation 8:wherein, the subscript c represents the serial number of the key feature point, gamma c A scale factor representing a key feature point with a sequence number of c, wherein the unit is mm/pixel;
adopting a trust domain algorithm to solve the numerical solution of formula 8, setting the initial value of the trust domain algorithm between 0 and 1, and then starting to solve the gamma of the first frame of the demonstration video 1 、γ 2 And gamma 3 And taking the calculation result of the previous frame as the initial value of the new iteration solution of the current frame, and reciprocating in this way to obtain the space coordinates of all key feature points in all frames of the demonstration videoExpression is as in equation 9;
equation 9:wherein, the superscript indicates the frame number, the maximum frame number is j, the subscript indicates the sequence number of the key feature point, and each vertical line contains the space coordinates of 3 key feature points in a frame.
The invention further adopts the technical scheme that: in the step S05, the unknown number gamma is solved c The actual length L between any two key feature points is combined to construct an equation set, and the equation set is shown as formulas 10-1, 10-2 and 10-3;
equation 10-1: l (L) 1-2 =||P 1 -P 2 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-2 For the actual length between key feature point 1 and key feature point 2, P 1 And P 2 See equation 8 for the expanded equation of (2);
equation 10-2: l (L) 1-3 =||P 1 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-3 For the actual length between key feature point 1 and key feature point 3, P 1 And P 3 See equation 8 for the expanded equation of (2);
equation 10-3: l (L) 2-3 =||P 2 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 2-3 As key feature points2 to the actual length between the key feature points 3, P 2 And P 3 See equation 8 for the expanded equation of (2);
by substituting equation 8 into equations 10-1, 10-2 and 10-3, respectively, equations 10-1, 10-2 and 10-3 can solve for γ c ,γ c Comprising gamma 1 、γ 2 And gamma 3 。
The invention further adopts the technical scheme that: in the S04 step, a scale-invariant feature transformation algorithm is applied, a plurality of feature points are extracted from a target area of a shot image, the positions of the key feature points in a pixel coordinate system are obtained at the same time, and then 3 key feature points with highest response values are further selected as main observation targets; the scale-invariant feature transformation algorithm comprises the following steps:
i, detecting the limit of a scale space: searching local maximum or minimum values in different scale spaces by constructing Gaussian differential scale spaces, wherein the extreme points are potential key feature points;
II, positioning key points: fine positioning is carried out on potential key feature points so as to remove points with relatively low contrast and response points of edges, thereby improving the stability and reliability of the key feature points;
and III, key point direction assignment: assigning one or more directions to each screened key feature point, so that the SIFT descriptor has rotation invariance, wherein the directions are determined by the directions of local image gradients;
IV, generating key point descriptors: and generating a descriptor for each screened key characteristic point, wherein the descriptor is a histogram of the gradient of the area around the key point.
Compared with the prior art, the invention has the following advantages: the method realizes feature recognition of the target object clamped by the mechanical arm in the video, accurately perceives the real three-dimensional space operation track of the target object, and further controls the mechanical arm to stably copy/restore the operations.
The invention is further described below with reference to the drawings and examples.
Drawings
FIG. 1 is a schematic illustration of an extension line from the center point of the end of an arm segment to the pixel position of a target object on a photographed image;
FIG. 2 is a schematic diagram of screening out points on the forward side of an extension vector on a captured image;
FIG. 3 is a schematic diagram of defining a center region on a captured image according to a location of a target region and deleting pixels outside the center region;
fig. 4 is a schematic diagram of screening out a target area according to a depth value range set in a target scene on a photographed image.
Detailed Description
Example 1:
a method for controlling a mechanical arm to simulate video actions relates to the mechanical arm and a camera. The mechanical arm comprises a base, an arm section and a clamping jaw which are sequentially connected, and the clamping jaw is used for clamping a target object. The target object is identified as P. The camera is a depth camera.
The method comprises the following steps:
s01, acquiring the position of a target object in a camera coordinate system:
calibrating a hand-eye system, establishing a coordinate conversion relation between a base and a camera, and determining a conversion matrix between a base coordinate system and a camera coordinate system by utilizing the relationSince the pose of the gripper in the base coordinate system can be derived from the state of the robot arm, it is assumed that the target object is at the gripper center point P in the base coordinate system Rob :(x Rob ,y Rob ,z Rob ) Then calculate the center point P of the clamping jaw in the basic coordinate system based on the formula 1 Rob Position P in camera coordinate system Dcam ;
Equation 1:
in the step, the hand-eye system is a binary system consisting of a mechanical arm and a depth camera, the base coordinate system is based on a base, and the center point of the clamping jaw is the center of sphere of the smallest sphere capable of completely containing the clamping jaw.
S02, acquiring pixel positions of a target object on an image:
after the camera is subjected to internal reference calibration, an internal reference Matrix shown in a formula 2 is obtained in ;
Equation 2:wherein f x C for the focal length of the camera along the X-axis of the captured image x To capture the optical center point of the X-axis of the image along the camera, f y C for the focal length of the camera along the Y-axis of the captured image y An optical center point along a Y-axis of a camera-captured image;
position P of target object from camera coordinate system Dcam Conversion to position P in a pixel coordinate System pixel The conversion process is based on equation 3;
equation 3: p (P) pixel =Matrix in ·P Dcam Wherein P is pixel Simply called a target object pixel coordinate point;
coordinate point P of target object pixel pixel Normalizing to obtain the pixel position P of the target object image As shown in equation 4;
equation 4:wherein, [0 ]]Index indicating first coordinate value, [1 ]]Index representing the second coordinate value, [2 ]]Index indicating the third coordinate value, P pixel [0]Representing the X-axis coordinate value, P, of the target object in the pixel coordinate system pixel [1]Representing Y-axis coordinate values, P, of a target object in a pixel coordinate system pixel [2]Representing the Z-axis coordinate value of the target object in the pixel coordinate system.
S03, extracting key feature points of a target object from the image:
referring to fig. 1, first, an arm segment end center point S is established on a photographed image to a target object pixel position P image Referring to FIG. 2, the point on the forward side of the vector of the extension line is selected, referring to FIG. 3, and then the target area is selectedDefining a central area at the position, deleting pixel points outside the central area, referring to fig. 4, and finally screening out a target area according to a depth value range set by a target scene; and extracting 3 key feature points of the target object from the target area to obtain the positions of the 3 key feature points in the pixel coordinate system, wherein the 3 key feature points have relatively higher stability and discrimination degree compared with other common feature points which are not screened, and can be detected again in images with different visual angles and different illumination conditions shot by a camera.
In this step, the target area is an area where the target object is located.
S04, acquiring a data set and training a neural network:
recording the positions of the 3 key feature points in a pixel coordinate system and the positions of the 3 key feature points in a camera coordinate system, and then performing random operation on a target object by the mechanical arm, wherein the random operation comprises rotation and translation; the following two operations are performed out of order: i, through conversion matrixCalculating new positions of the clamping jaw center point and the 3 key feature points in a camera coordinate system after the operation is executed; II, acquiring a newly generated target area after operation; repeating the random operation for a plurality of times, and recording the positions of the 3 key feature points and the target area before and after each operation to form a data set for training a subsequent Keypoint-RCNN neural network; after training is completed, the Keypoint-RCNN neural network can automatically acquire 3 key feature points every time when the mechanical arm clamping jaw in the demonstration video clamps any target object, and track the 3 key feature points and predict the positions of the 3 key feature points in a pixel coordinate system along with the playing of the demonstration video.
In the step, a scale-invariant feature transformation algorithm is applied, a plurality of feature points are extracted from a target area of a shot image, the positions of the key feature points in a pixel coordinate system are obtained at the same time, and then 3 key feature points with highest response values are further selected as main observation targets.
The scale-invariant feature transformation algorithm comprises the following steps:
i, detecting the limit of a scale space: searching local maximum or minimum values in different scale spaces by constructing Gaussian differential scale spaces, wherein the extreme points are potential key feature points;
II, positioning key points: fine positioning is carried out on potential key feature points so as to remove points with relatively low contrast and response points of edges, thereby improving the stability and reliability of the key feature points;
and III, key point direction assignment: assigning one or more directions to each screened key feature point, so that the SIFT descriptor has rotation invariance, wherein the directions are determined by the directions of local image gradients;
IV, generating key point descriptors: a descriptor is generated for each key feature point screened, and the descriptor is a histogram of the gradient of the area around the key point, and the histograms are converted in the scale and direction of the key feature point so as to ensure scale and rotation invariance.
S05, mapping the gesture of the target object in the demonstration video to a real space:
preparing a demonstration video which needs to be imitated by a mechanical arm in advance, and clamping any article by a clamping jaw of the mechanical arm in the demonstration video to execute operation; the key point-RCNN neural network automatically acquires 3 key feature points from the demonstration video, tracks the 3 key feature points along with the playing of the demonstration video and predicts the positions of the 3 key feature points in a pixel coordinate system;
firstly, measuring the angle of view of a built-in parameter of a camera and demonstrating the actual length among key feature points in a first frame of a video as priori knowledge;
then, when the camera is horizontally placed, a space coordinate system is established, in the space coordinate system, the origin is the center of the camera lens, the x-axis is arranged along the horizontal direction and is perpendicular to the axial center line of the camera lens, the y-axis is arranged along the vertical direction, and the z-axis is directed to the direction of the target object along the lens; and finally, mapping coordinates of the 3 key feature points automatically acquired by the Keypoint-RCNN neural network in an image coordinate system to coordinates in a space coordinate system.
The mapping process is as follows:
the coordinates of the key feature points in the image coordinate system are (P cx ,P cy ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, P cx Representing the pixel coordinates of the c-th key feature point in the x direction, P cy Representing pixel coordinates of the c-th key feature point in the y direction; the coordinates of the key feature points in the real space are (x c ,y c ,z c ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, x c Representing the spatial coordinates of the c-th key feature point on the x-axis, y c Representing the spatial coordinates of the c-th key feature point in the y direction, z c Representing the spatial coordinates of the c-th key feature point in the z direction;
the effective focal length EFL is calculated with reference to equation 7;
equation 7:wherein Fov is the camera built-in parameter field angle, and H is the pixel height of the shot image;
the transformation mapping P of key feature points from the image coordinate system to the spatial coordinate system c See formula 8;
equation 8:wherein, the subscript c represents the serial number of the key feature point, gamma c A scale factor representing a key feature point with a sequence number of c, wherein the unit is mm/pixel;
adopting a trust domain algorithm to solve the numerical solution of formula 8, setting the initial value of the trust domain algorithm between 0 and 1, and then starting to solve the gamma of the first frame of the demonstration video 1 、γ 2 And gamma 3 And taking the calculation result of the previous frame as the initial value of the new iteration solution of the current frame, and reciprocating in this way to obtain the space coordinates of all key feature points in all frames of the demonstration videoExpression is as in equation 9;
equation 9:wherein, the superscript indicates the frame number, the maximum frame number is j, the subscript indicates the sequence number of the key feature point, and each vertical line contains the space coordinates of 3 key feature points in a frame.
In this step, the unknown number gamma is solved c The actual length L between any two key feature points is combined to construct an equation set, and the equation set is shown as formulas 10-1, 10-2 and 10-3;
equation 10-1: l (L) 1-2 =||P 1 -P 2 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-2 For the actual length between key feature point 1 and key feature point 2, P 1 And P 2 See equation 8 for the expanded equation of (2);
equation 10-2: l (L) 1-3 =||P 1 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-3 For the actual length between key feature point 1 and key feature point 3, P 1 And P 3 See equation 8 for the expanded equation of (2);
equation 10-3: l (L) 2-3 =||P 2 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 2-3 For the actual length between key feature point 2 and key feature point 3, P 2 And P 3 See equation 8 for the expanded equation of (2);
by substituting equation 8 into equations 10-1, 10-2 and 10-3, respectively, equations 10-1, 10-2 and 10-3 can solve for γ c ,γ c Comprising gamma 1 、γ 2 And gamma 3 。
S06, mapping the space coordinates of the target object to a base coordinate system:
passing spatial coordinates of 3 key feature points in all frames of the demonstration video through a conversion matrixMapping to a base coordinate system to obtain a formula 10;
equation 10:wherein (1)>Spatial coordinates for 3 key feature points in all frames, +.>Base coordinates in a base coordinate system for 3 key feature points in all frames; the target position to be reached by the mechanical arm can be calculated based on the base coordinates, and then a control command is output (by a control system for controlling the mechanical arm to act) to control the mechanical arm to act so as to reach the target position.
Claims (5)
1. A method for controlling a mechanical arm to simulate video actions, which relates to the mechanical arm and a camera; the mechanical arm comprises a base, an arm section and a clamping jaw which are sequentially connected, and the clamping jaw is used for clamping a target object; the target object is marked as P; the camera is a depth camera;
the method is characterized by comprising the following steps:
s01, acquiring the position of a target object in a camera coordinate system:
calibrating a hand-eye system, establishing a coordinate conversion relation between a base and a camera, and determining a conversion matrix between a base coordinate system and a camera coordinate system by utilizing the relationSince the pose of the gripper in the base coordinate system can be derived from the state of the robot arm, it is assumed that the target object is at the gripper center point P in the base coordinate system Rob :(x Rob ,y Rob ,z Rob ) Then calculate the center point P of the clamping jaw in the basic coordinate system based on the formula 1 Rob Position P in camera coordinate system Dcam ;
Equation 1:
s02, acquiring pixel positions of a target object on an image:
after the camera is subjected to internal reference calibration, an internal reference Matrix shown in a formula 2 is obtained in ;
Equation 2:wherein f x C for the focal length of the camera along the X-axis of the captured image x To capture the optical center point of the X-axis of the image along the camera, f y C for the focal length of the camera along the Y-axis of the captured image y An optical center point along a Y-axis of a camera-captured image;
position P of target object from camera coordinate system Dcam Conversion to position P in a pixel coordinate System pixel The conversion process is based on equation 3;
equation 3: p (P) pixel =Matrix in ·P Dcam Wherein P is pixel Simply called a target object pixel coordinate point;
coordinate point P of target object pixel pixel Normalizing to obtain the pixel position P of the target object image As shown in equation 4;
equation 4:wherein, [0 ]]Index indicating first coordinate value, [1 ]]Index representing the second coordinate value, [2 ]]Index indicating the third coordinate value, P pixel [0]Representing the X-axis coordinate value, P, of the target object in the pixel coordinate system pixel [1]Representing Y-axis coordinate values, P, of a target object in a pixel coordinate system pixel [2]A Z-axis coordinate value representing the target object in a pixel coordinate system;
s03, extracting key feature points of a target object from the image:
firstly, establishing an arm segment end central point S to a target object pixel position P on a shooting image image Then screening out the point on the forward side of the extension vector, and defining a central region according to the position of the target regionDeleting the pixel points outside the central area, and finally screening out a target area according to a depth value range set by a target scene; extracting 3 key feature points of a target object from a target area to obtain positions of the 3 key feature points in a pixel coordinate system, wherein the 3 key feature points have relatively higher stability and discrimination degree compared with other common feature points which are not screened, and can be detected again in images with different visual angles and different illumination conditions shot by a camera;
in the step, the target area is an area where the target object is located;
s04, acquiring a data set and training a neural network:
recording the positions of the 3 key feature points in a pixel coordinate system and the positions of the 3 key feature points in a camera coordinate system, and then performing random operation on a target object by the mechanical arm, wherein the random operation comprises rotation and translation; the following two operations are performed out of order: i, through conversion matrixCalculating new positions of the clamping jaw center point and the 3 key feature points in a camera coordinate system after the operation is executed; II, acquiring a newly generated target area after operation; repeating the random operation for a plurality of times, and recording the positions of the 3 key feature points and the target area before and after each operation to form a data set for training a subsequent Keypoint-RCNN neural network; after training is completed, each time the mechanical arm clamping jaw in the demonstration video is observed to clamp any target object, the Keypoint-RCNN neural network can automatically acquire 3 key feature points, and the 3 key feature points are tracked and the positions of the 3 key feature points in a pixel coordinate system are predicted along with the playing of the demonstration video;
s05, mapping the gesture of the target object in the demonstration video to a real space:
preparing a demonstration video which needs to be imitated by a mechanical arm in advance, and clamping any article by a clamping jaw of the mechanical arm in the demonstration video to execute operation; the key point-RCNN neural network automatically acquires 3 key feature points from the demonstration video, tracks the 3 key feature points along with the playing of the demonstration video and predicts the positions of the 3 key feature points in a pixel coordinate system;
firstly, measuring the angle of view of a built-in parameter of a camera and demonstrating the actual length among key feature points in a first frame of a video as priori knowledge;
then, when the camera is horizontally placed, a space coordinate system is established, in the space coordinate system, the origin is the center of the camera lens, the x-axis is arranged along the horizontal direction and is perpendicular to the axial center line of the camera lens, the y-axis is arranged along the vertical direction, and the z-axis is directed to the direction of the target object along the lens; finally, mapping coordinates of 3 key feature points automatically acquired by the Keypoint-RCNN neural network in an image coordinate system to coordinates in a space coordinate system;
s06, mapping the space coordinates of the target object to a base coordinate system:
passing spatial coordinates of 3 key feature points in all frames of the demonstration video through a conversion matrixMapping to a base coordinate system to obtain a formula 10;
equation 10:wherein (1)>For the spatial coordinates of 3 key feature points in all frames,base coordinates for 3 key feature points in all frames; the target position to be reached by the mechanical arm can be calculated based on the base coordinates, and then a control command is output to control the mechanical arm to act so as to reach the target position.
2. The method for controlling a robotic arm to simulate a video action of claim 1, wherein: in the step S01, the hand-eye system is a binary system consisting of a mechanical arm and a depth camera, the base coordinate system is based on a base, and the center point of the clamping jaw is the center of a minimum sphere capable of completely containing the clamping jaw.
3. The method for controlling a robotic arm to simulate a video action of claim 1, wherein: in step S05, the mapping process is as follows:
the coordinates of the key feature points in the image coordinate system are (P cx ,P cy ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, P cx Representing the pixel coordinates of the c-th key feature point in the x direction, P cy Representing pixel coordinates of the c-th key feature point in the y direction; the coordinates of the key feature points in the real space are (x c ,y c ,z c ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, x c Representing the spatial coordinates of the c-th key feature point on the x-axis, y c Representing the spatial coordinates of the c-th key feature point in the y direction, z c Representing the spatial coordinates of the c-th key feature point in the z direction;
the effective focal length EFL is calculated with reference to equation 7;
equation 7:wherein Fov is the camera built-in parameter field angle, and H is the pixel height of the shot image;
the transformation mapping P of key feature points from the image coordinate system to the spatial coordinate system c See formula 8;
equation 8:wherein, the subscript c represents the serial number of the key feature point, gamma c A scale factor representing a key feature point with a sequence number of c, wherein the unit is mm/pixel;
adopting a trust domain algorithm to solve the numerical solution of formula 8, setting the initial value of the trust domain algorithm between 0 and 1, and starting to solve the first frame of the demonstration videoGamma of (2) 1 、γ 2 And gamma 3 And taking the calculation result of the previous frame as the initial value of the new iteration solution of the current frame, and reciprocating in this way to obtain the space coordinates of all key feature points in all frames of the demonstration videoExpression is as in equation 9;
equation 9:wherein, the superscript indicates the frame number, the maximum frame number is j, the subscript indicates the sequence number of the key feature point, and each vertical line contains the space coordinates of 3 key feature points in a frame.
4. A method of controlling a robotic arm to simulate video actions as claimed in claim 3, wherein: in the step S05, the unknown number gamma is solved c The actual length L between any two key feature points is combined to construct an equation set, and the equation set is shown as formulas 10-1, 10-2 and 10-3;
equation 10-1: l (L) 1-2 =||P 1 -P 2 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-2 For the actual length between key feature point 1 and key feature point 2, P 1 And P 2 See equation 8 for the expanded equation of (2);
equation 10-2: l (L) 1-3 =||P 1 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-3 For the actual length between key feature point 1 and key feature point 3, P 1 And P 3 See equation 8 for the expanded equation of (2);
equation 10-3: l (L) 2-3 =||P 2 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 2-3 For the actual length between key feature point 2 and key feature point 3, P 2 And P 3 See equation 8 for the expanded equation of (2);
by substituting equation 8 into equations 10-1, 10-2 and 10-3, respectively, equations 10-1, 10-2 and 10-3 can solve for γ c ,γ c Comprising gamma 1 、γ 2 And gamma 3 。
5. The method for controlling a robotic arm to simulate a video action according to claim 4, wherein: in the S04 step, a scale-invariant feature transformation algorithm is applied, a plurality of feature points are extracted from a target area of a shot image, the positions of the key feature points in a pixel coordinate system are obtained at the same time, and then 3 key feature points with highest response values are further selected as main observation targets; the scale-invariant feature transformation algorithm comprises the following steps:
i, detecting the limit of a scale space: searching local maximum or minimum values in different scale spaces by constructing Gaussian differential scale spaces, wherein the extreme points are potential key feature points;
II, positioning key points: fine positioning is carried out on potential key feature points so as to remove points with relatively low contrast and response points of edges, thereby improving the stability and reliability of the key feature points;
and III, key point direction assignment: assigning one or more directions to each screened key feature point, so that the SIFT descriptor has rotation invariance, wherein the directions are determined by the directions of local image gradients;
IV, generating key point descriptors: and generating a descriptor for each screened key characteristic point, wherein the descriptor is a histogram of the gradient of the area around the key point.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311571826.9A CN117464683A (en) | 2023-11-23 | 2023-11-23 | Method for controlling mechanical arm to simulate video motion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311571826.9A CN117464683A (en) | 2023-11-23 | 2023-11-23 | Method for controlling mechanical arm to simulate video motion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117464683A true CN117464683A (en) | 2024-01-30 |
Family
ID=89631268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311571826.9A Pending CN117464683A (en) | 2023-11-23 | 2023-11-23 | Method for controlling mechanical arm to simulate video motion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117464683A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111136659A (en) * | 2020-01-15 | 2020-05-12 | 南京大学 | Mechanical arm action learning method and system based on third person scale imitation learning |
CN111203878A (en) * | 2020-01-14 | 2020-05-29 | 北京航空航天大学 | Robot sequence task learning method based on visual simulation |
CN111300431A (en) * | 2020-03-31 | 2020-06-19 | 山东大学 | Cross-scene-oriented robot vision simulation learning method and system |
CN112180720A (en) * | 2020-09-08 | 2021-01-05 | 武汉大学 | Fiber placement process parameter model construction method and system based on simulation learning |
CN112509392A (en) * | 2020-12-16 | 2021-03-16 | 复旦大学 | Robot behavior teaching method based on meta-learning |
CN112975968A (en) * | 2021-02-26 | 2021-06-18 | 同济大学 | Mechanical arm simulation learning method based on third visual angle variable main body demonstration video |
CN113524194A (en) * | 2021-04-28 | 2021-10-22 | 重庆理工大学 | Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning |
KR20230134328A (en) * | 2022-03-14 | 2023-09-21 | 한국전자통신연구원 | Apparatus and method for teaching robot |
-
2023
- 2023-11-23 CN CN202311571826.9A patent/CN117464683A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111203878A (en) * | 2020-01-14 | 2020-05-29 | 北京航空航天大学 | Robot sequence task learning method based on visual simulation |
CN111136659A (en) * | 2020-01-15 | 2020-05-12 | 南京大学 | Mechanical arm action learning method and system based on third person scale imitation learning |
CN111300431A (en) * | 2020-03-31 | 2020-06-19 | 山东大学 | Cross-scene-oriented robot vision simulation learning method and system |
CN112180720A (en) * | 2020-09-08 | 2021-01-05 | 武汉大学 | Fiber placement process parameter model construction method and system based on simulation learning |
CN112509392A (en) * | 2020-12-16 | 2021-03-16 | 复旦大学 | Robot behavior teaching method based on meta-learning |
CN112975968A (en) * | 2021-02-26 | 2021-06-18 | 同济大学 | Mechanical arm simulation learning method based on third visual angle variable main body demonstration video |
CN113524194A (en) * | 2021-04-28 | 2021-10-22 | 重庆理工大学 | Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning |
KR20230134328A (en) * | 2022-03-14 | 2023-09-21 | 한국전자통신연구원 | Apparatus and method for teaching robot |
Non-Patent Citations (1)
Title |
---|
高歌: "面向服务机器人示范学习的操作行为识别方法", 中国优秀硕士学位论文全文数据库 信息科技辑, 15 May 2019 (2019-05-15) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113524194B (en) | Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning | |
CN107471218B (en) | Binocular vision-based hand-eye coordination method for double-arm robot | |
CN109308693B (en) | Single-binocular vision system for target detection and pose measurement constructed by one PTZ camera | |
JP6011102B2 (en) | Object posture estimation method | |
CN112233181A (en) | 6D pose recognition method and device and computer storage medium | |
CN111721259B (en) | Underwater robot recovery positioning method based on binocular vision | |
CN109816730B (en) | Workpiece grabbing method and device, computer equipment and storage medium | |
CN111553949B (en) | Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning | |
CN111476841B (en) | Point cloud and image-based identification and positioning method and system | |
CN112598729B (en) | Target object identification and positioning method integrating laser and camera | |
CN109033989B (en) | Target identification method and device based on three-dimensional point cloud and storage medium | |
CN111627072A (en) | Method and device for calibrating multiple sensors and storage medium | |
CN112067233B (en) | Six-degree-of-freedom motion capture method for wind tunnel model | |
CN108961144A (en) | Image processing system | |
CN111784655B (en) | Underwater robot recycling and positioning method | |
CN115816460B (en) | Mechanical arm grabbing method based on deep learning target detection and image segmentation | |
CN112775959A (en) | Method and system for determining grabbing pose of manipulator and storage medium | |
JP2014029664A (en) | Image comparison range generation method, positional orientation detection method, image comparison range generation device, positional orientation detection device, robot, robot system, image comparison range generation program and positional orientation detection program | |
CN112164112A (en) | Method and device for acquiring pose information of mechanical arm | |
Jeon et al. | Underwater object detection and pose estimation using deep learning | |
CN115213896A (en) | Object grabbing method, system and equipment based on mechanical arm and storage medium | |
CN115629066A (en) | Method and device for automatic wiring based on visual guidance | |
CN112801988A (en) | Object grabbing pose detection method based on RGBD and deep neural network | |
CN116766194A (en) | Binocular vision-based disc workpiece positioning and grabbing system and method | |
US20230150142A1 (en) | Device and method for training a machine learning model for generating descriptor images for images of objects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |