CN117464683A - Method for controlling mechanical arm to simulate video motion - Google Patents

Method for controlling mechanical arm to simulate video motion Download PDF

Info

Publication number
CN117464683A
CN117464683A CN202311571826.9A CN202311571826A CN117464683A CN 117464683 A CN117464683 A CN 117464683A CN 202311571826 A CN202311571826 A CN 202311571826A CN 117464683 A CN117464683 A CN 117464683A
Authority
CN
China
Prior art keywords
key feature
coordinate system
feature points
target object
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311571826.9A
Other languages
Chinese (zh)
Inventor
程鹏
周俊莹
邵晨曦
程璐
巩浩
刘检华
邓新建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Machinery Productivity Promotion Center Co ltd
Original Assignee
China Machinery Productivity Promotion Center Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Machinery Productivity Promotion Center Co ltd filed Critical China Machinery Productivity Promotion Center Co ltd
Priority to CN202311571826.9A priority Critical patent/CN117464683A/en
Publication of CN117464683A publication Critical patent/CN117464683A/en
Pending legal-status Critical Current

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1669Programme controls characterised by programming, planning systems for manipulators characterised by special application, e.g. multi-arm co-operation, assembly, grasping
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning

Abstract

A method for controlling a mechanical arm to simulate video actions, which relates to the mechanical arm and a camera; the mechanical arm comprises a base, an arm section and a clamping jaw which are sequentially connected, and the clamping jaw is used for clamping a target object; the method comprises the following steps: 1, acquiring the position of a target object in a camera coordinate system; 2, acquiring the pixel position of the target object on the image; extracting key feature points of the target object from the image; acquiring a data set and training a neural network; 5, mapping the gesture of the target object in the demonstration video to a real space; and 6, mapping the space coordinates of the target object to a base coordinate system. The invention realizes the feature recognition of the target object clamped by the mechanical arm in the video, accurately perceives the real three-dimensional space operation track of the target object, and further controls the stable copying/restoring operation of the mechanical arm.

Description

Method for controlling mechanical arm to simulate video motion
Technical Field
The invention relates to the technical field of robot automatic control, in particular to a method for controlling a mechanical arm to simulate video actions.
Background
With the development of computer and robotics, traditional manual assembly is being gradually replaced by industrial robotic assembly. The wide application of industrial robots greatly improves assembly efficiency and reliability. At present, on an industrial production line, a teaching programming or off-line programming method is mainly used for ensuring that a robot automatically completes an assembly task. However, when different types of assembly tasks need to be handled, the programming effort is significant. In addition, due to the low level of human-machine cooperation, conventional robot programming methods are difficult to apply to complex assembly scenarios.
Man-machine cooperation assembly based on artificial intelligence is actively developed, which further improves assembly efficiency, quality, flexibility and sustainability, and can be suitable for various future assembly scenes. In human-computer collaboration based on artificial intelligence, vision is an important way for robots to obtain environmental information. However, it is difficult for the robot to perform feature recognition on an unknown object from a video, and in particular, it is difficult to accurately perceive a real three-dimensional space operation trajectory of a target object from the video and stably reproduce the operations.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a method for controlling a mechanical arm to simulate video actions, which realizes feature recognition on unknown objects in video and accurate perception of real three-dimensional space operation tracks of target objects, thereby stably copying the operations.
The technical scheme of the invention is as follows: a method for controlling a mechanical arm to simulate video actions, which relates to the mechanical arm and a camera; the mechanical arm comprises a base, an arm section and a clamping jaw which are sequentially connected, and the clamping jaw is used for clamping a target object; the target object is marked as P; the camera is a depth camera;
the method comprises the following steps:
s01, acquiring the position of a target object in a camera coordinate system:
calibrating a hand-eye system, establishing a coordinate conversion relation between the base and the camera, and determining a base coordinate system and a camera sitting position by utilizing the relationConversion matrix between labelsSince the pose of the gripper in the base coordinate system can be derived from the state of the robot arm, it is assumed that the target object is at the gripper center point P in the base coordinate system Rob :(x Rob ,y Rob ,z Rob ) Then calculate the center point P of the clamping jaw in the basic coordinate system based on the formula 1 Rob Position P in camera coordinate system Dcam
Equation 1:
s02, acquiring pixel positions of a target object on an image:
after the camera is subjected to internal reference calibration, an internal reference Matrix shown in a formula 2 is obtained in
Equation 2:wherein f x C for the focal length of the camera along the X-axis of the captured image x To capture the optical center point of the X-axis of the image along the camera, f y C for the focal length of the camera along the Y-axis of the captured image y An optical center point along a Y-axis of a camera-captured image;
position P of target object from camera coordinate system Dcam Conversion to position P in a pixel coordinate System pixel The conversion process is based on equation 3;
equation 3: p (P) pixel =Matrix in ·P Dcam Wherein P is pixel Simply called a target object pixel coordinate point;
coordinate point P of target object pixel pixel Normalizing to obtain the pixel position P of the target object image As shown in equation 4;
equation 4:wherein, [0 ]]Index indicating first coordinate value, [1 ]]Index representing the second coordinate value, [2 ]]Index indicating the third coordinate value, P pixel [0]Representing the X-axis coordinate value, P, of the target object in the pixel coordinate system pixel [1]Representing Y-axis coordinate values, P, of a target object in a pixel coordinate system pixel [2]A Z-axis coordinate value representing the target object in a pixel coordinate system;
s03, extracting key feature points of a target object from the image:
firstly, establishing an arm segment end central point S to a target object pixel position P on a shooting image image Screening out the point on the forward side of the extension line vector, defining a central area according to the position of the target area, deleting the pixel points outside the central area, and screening out the target area according to the depth value range set by the target scene; extracting 3 key feature points of a target object from a target area to obtain positions of the 3 key feature points in a pixel coordinate system, wherein the 3 key feature points have relatively higher stability and discrimination degree compared with other common feature points which are not screened, and can be detected again in images with different visual angles and different illumination conditions shot by a camera;
in the step, the target area is an area where the target object is located;
s04, acquiring a data set and training a neural network:
recording the positions of the 3 key feature points in a pixel coordinate system and the positions of the 3 key feature points in a camera coordinate system, and then performing random operation on a target object by the mechanical arm, wherein the random operation comprises rotation and translation; the following two operations are performed out of order: i, through conversion matrixCalculating new positions of the clamping jaw center point and the 3 key feature points in a camera coordinate system after the operation is executed; II, acquiring a newly generated target area after operation; repeating the random operation for a plurality of times, and recording the positions of the 3 key feature points and the target area before and after each operation to form a data set for training a subsequent Keypoint-RCNN neural network; collectingAfter training is completed, each time the mechanical arm clamping jaw in the demonstration video is observed to clamp any target object, the Keypoint-RCNN neural network can automatically acquire 3 key feature points, and the 3 key feature points are tracked and the positions of the 3 key feature points in a pixel coordinate system are predicted along with the playing of the demonstration video;
s05, mapping the gesture of the target object in the demonstration video to a real space:
preparing a demonstration video which needs to be imitated by a mechanical arm in advance, and clamping any article by a clamping jaw of the mechanical arm in the demonstration video to execute operation; the key point-RCNN neural network automatically acquires 3 key feature points from the demonstration video, tracks the 3 key feature points along with the playing of the demonstration video and predicts the positions of the 3 key feature points in a pixel coordinate system;
firstly, measuring the angle of view of a built-in parameter of a camera and demonstrating the actual length among key feature points in a first frame of a video as priori knowledge;
then, when the camera is horizontally placed, a space coordinate system is established, in the space coordinate system, the origin is the center of the camera lens, the x-axis is arranged along the horizontal direction and is perpendicular to the axial center line of the camera lens, the y-axis is arranged along the vertical direction, and the z-axis is directed to the direction of the target object along the lens; finally, mapping coordinates of 3 key feature points automatically acquired by the Keypoint-RCNN neural network in an image coordinate system to coordinates in a space coordinate system;
s06, mapping the space coordinates of the target object to a base coordinate system:
passing spatial coordinates of 3 key feature points in all frames of the demonstration video through a conversion matrixMapping to a base coordinate system to obtain a formula 10;
equation 10:wherein (1)>Spatial coordinates for 3 key feature points in all frames, +.>Base coordinates for 3 key feature points in all frames; the target position to be reached by the mechanical arm can be calculated based on the base coordinates, and then a control command is output to control the mechanical arm to act so as to reach the target position.
The invention further adopts the technical scheme that: in the step S01, the hand-eye system is a binary system consisting of a mechanical arm and a depth camera, the base coordinate system is based on a base, and the center point of the clamping jaw is the center of a minimum sphere capable of completely containing the clamping jaw.
The invention further adopts the technical scheme that: in step S05, the mapping process is as follows:
the coordinates of the key feature points in the image coordinate system are (P cx ,P cy ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, P cx Representing the pixel coordinates of the c-th key feature point in the x direction, P cy Representing pixel coordinates of the c-th key feature point in the y direction; the coordinates of the key feature points in the real space are (x c ,y c ,z c ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, x c Representing the spatial coordinates of the c-th key feature point on the x-axis, y c Representing the spatial coordinates of the c-th key feature point in the y direction, z c Representing the spatial coordinates of the c-th key feature point in the z direction;
the effective focal length EFL is calculated with reference to equation 7;
equation 7:wherein Fov is the camera built-in parameter field angle, and H is the pixel height of the shot image;
the transformation mapping P of key feature points from the image coordinate system to the spatial coordinate system c See formula 8;
equation 8:wherein, the subscript c represents the serial number of the key feature point, gamma c A scale factor representing a key feature point with a sequence number of c, wherein the unit is mm/pixel;
adopting a trust domain algorithm to solve the numerical solution of formula 8, setting the initial value of the trust domain algorithm between 0 and 1, and then starting to solve the gamma of the first frame of the demonstration video 1 、γ 2 And gamma 3 And taking the calculation result of the previous frame as the initial value of the new iteration solution of the current frame, and reciprocating in this way to obtain the space coordinates of all key feature points in all frames of the demonstration videoExpression is as in equation 9;
equation 9:wherein, the superscript indicates the frame number, the maximum frame number is j, the subscript indicates the sequence number of the key feature point, and each vertical line contains the space coordinates of 3 key feature points in a frame.
The invention further adopts the technical scheme that: in the step S05, the unknown number gamma is solved c The actual length L between any two key feature points is combined to construct an equation set, and the equation set is shown as formulas 10-1, 10-2 and 10-3;
equation 10-1: l (L) 1-2 =||P 1 -P 2 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-2 For the actual length between key feature point 1 and key feature point 2, P 1 And P 2 See equation 8 for the expanded equation of (2);
equation 10-2: l (L) 1-3 =||P 1 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-3 For the actual length between key feature point 1 and key feature point 3, P 1 And P 3 See equation 8 for the expanded equation of (2);
equation 10-3: l (L) 2-3 =||P 2 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 2-3 As key feature points2 to the actual length between the key feature points 3, P 2 And P 3 See equation 8 for the expanded equation of (2);
by substituting equation 8 into equations 10-1, 10-2 and 10-3, respectively, equations 10-1, 10-2 and 10-3 can solve for γ c ,γ c Comprising gamma 1 、γ 2 And gamma 3
The invention further adopts the technical scheme that: in the S04 step, a scale-invariant feature transformation algorithm is applied, a plurality of feature points are extracted from a target area of a shot image, the positions of the key feature points in a pixel coordinate system are obtained at the same time, and then 3 key feature points with highest response values are further selected as main observation targets; the scale-invariant feature transformation algorithm comprises the following steps:
i, detecting the limit of a scale space: searching local maximum or minimum values in different scale spaces by constructing Gaussian differential scale spaces, wherein the extreme points are potential key feature points;
II, positioning key points: fine positioning is carried out on potential key feature points so as to remove points with relatively low contrast and response points of edges, thereby improving the stability and reliability of the key feature points;
and III, key point direction assignment: assigning one or more directions to each screened key feature point, so that the SIFT descriptor has rotation invariance, wherein the directions are determined by the directions of local image gradients;
IV, generating key point descriptors: and generating a descriptor for each screened key characteristic point, wherein the descriptor is a histogram of the gradient of the area around the key point.
Compared with the prior art, the invention has the following advantages: the method realizes feature recognition of the target object clamped by the mechanical arm in the video, accurately perceives the real three-dimensional space operation track of the target object, and further controls the mechanical arm to stably copy/restore the operations.
The invention is further described below with reference to the drawings and examples.
Drawings
FIG. 1 is a schematic illustration of an extension line from the center point of the end of an arm segment to the pixel position of a target object on a photographed image;
FIG. 2 is a schematic diagram of screening out points on the forward side of an extension vector on a captured image;
FIG. 3 is a schematic diagram of defining a center region on a captured image according to a location of a target region and deleting pixels outside the center region;
fig. 4 is a schematic diagram of screening out a target area according to a depth value range set in a target scene on a photographed image.
Detailed Description
Example 1:
a method for controlling a mechanical arm to simulate video actions relates to the mechanical arm and a camera. The mechanical arm comprises a base, an arm section and a clamping jaw which are sequentially connected, and the clamping jaw is used for clamping a target object. The target object is identified as P. The camera is a depth camera.
The method comprises the following steps:
s01, acquiring the position of a target object in a camera coordinate system:
calibrating a hand-eye system, establishing a coordinate conversion relation between a base and a camera, and determining a conversion matrix between a base coordinate system and a camera coordinate system by utilizing the relationSince the pose of the gripper in the base coordinate system can be derived from the state of the robot arm, it is assumed that the target object is at the gripper center point P in the base coordinate system Rob :(x Rob ,y Rob ,z Rob ) Then calculate the center point P of the clamping jaw in the basic coordinate system based on the formula 1 Rob Position P in camera coordinate system Dcam
Equation 1:
in the step, the hand-eye system is a binary system consisting of a mechanical arm and a depth camera, the base coordinate system is based on a base, and the center point of the clamping jaw is the center of sphere of the smallest sphere capable of completely containing the clamping jaw.
S02, acquiring pixel positions of a target object on an image:
after the camera is subjected to internal reference calibration, an internal reference Matrix shown in a formula 2 is obtained in
Equation 2:wherein f x C for the focal length of the camera along the X-axis of the captured image x To capture the optical center point of the X-axis of the image along the camera, f y C for the focal length of the camera along the Y-axis of the captured image y An optical center point along a Y-axis of a camera-captured image;
position P of target object from camera coordinate system Dcam Conversion to position P in a pixel coordinate System pixel The conversion process is based on equation 3;
equation 3: p (P) pixel =Matrix in ·P Dcam Wherein P is pixel Simply called a target object pixel coordinate point;
coordinate point P of target object pixel pixel Normalizing to obtain the pixel position P of the target object image As shown in equation 4;
equation 4:wherein, [0 ]]Index indicating first coordinate value, [1 ]]Index representing the second coordinate value, [2 ]]Index indicating the third coordinate value, P pixel [0]Representing the X-axis coordinate value, P, of the target object in the pixel coordinate system pixel [1]Representing Y-axis coordinate values, P, of a target object in a pixel coordinate system pixel [2]Representing the Z-axis coordinate value of the target object in the pixel coordinate system.
S03, extracting key feature points of a target object from the image:
referring to fig. 1, first, an arm segment end center point S is established on a photographed image to a target object pixel position P image Referring to FIG. 2, the point on the forward side of the vector of the extension line is selected, referring to FIG. 3, and then the target area is selectedDefining a central area at the position, deleting pixel points outside the central area, referring to fig. 4, and finally screening out a target area according to a depth value range set by a target scene; and extracting 3 key feature points of the target object from the target area to obtain the positions of the 3 key feature points in the pixel coordinate system, wherein the 3 key feature points have relatively higher stability and discrimination degree compared with other common feature points which are not screened, and can be detected again in images with different visual angles and different illumination conditions shot by a camera.
In this step, the target area is an area where the target object is located.
S04, acquiring a data set and training a neural network:
recording the positions of the 3 key feature points in a pixel coordinate system and the positions of the 3 key feature points in a camera coordinate system, and then performing random operation on a target object by the mechanical arm, wherein the random operation comprises rotation and translation; the following two operations are performed out of order: i, through conversion matrixCalculating new positions of the clamping jaw center point and the 3 key feature points in a camera coordinate system after the operation is executed; II, acquiring a newly generated target area after operation; repeating the random operation for a plurality of times, and recording the positions of the 3 key feature points and the target area before and after each operation to form a data set for training a subsequent Keypoint-RCNN neural network; after training is completed, the Keypoint-RCNN neural network can automatically acquire 3 key feature points every time when the mechanical arm clamping jaw in the demonstration video clamps any target object, and track the 3 key feature points and predict the positions of the 3 key feature points in a pixel coordinate system along with the playing of the demonstration video.
In the step, a scale-invariant feature transformation algorithm is applied, a plurality of feature points are extracted from a target area of a shot image, the positions of the key feature points in a pixel coordinate system are obtained at the same time, and then 3 key feature points with highest response values are further selected as main observation targets.
The scale-invariant feature transformation algorithm comprises the following steps:
i, detecting the limit of a scale space: searching local maximum or minimum values in different scale spaces by constructing Gaussian differential scale spaces, wherein the extreme points are potential key feature points;
II, positioning key points: fine positioning is carried out on potential key feature points so as to remove points with relatively low contrast and response points of edges, thereby improving the stability and reliability of the key feature points;
and III, key point direction assignment: assigning one or more directions to each screened key feature point, so that the SIFT descriptor has rotation invariance, wherein the directions are determined by the directions of local image gradients;
IV, generating key point descriptors: a descriptor is generated for each key feature point screened, and the descriptor is a histogram of the gradient of the area around the key point, and the histograms are converted in the scale and direction of the key feature point so as to ensure scale and rotation invariance.
S05, mapping the gesture of the target object in the demonstration video to a real space:
preparing a demonstration video which needs to be imitated by a mechanical arm in advance, and clamping any article by a clamping jaw of the mechanical arm in the demonstration video to execute operation; the key point-RCNN neural network automatically acquires 3 key feature points from the demonstration video, tracks the 3 key feature points along with the playing of the demonstration video and predicts the positions of the 3 key feature points in a pixel coordinate system;
firstly, measuring the angle of view of a built-in parameter of a camera and demonstrating the actual length among key feature points in a first frame of a video as priori knowledge;
then, when the camera is horizontally placed, a space coordinate system is established, in the space coordinate system, the origin is the center of the camera lens, the x-axis is arranged along the horizontal direction and is perpendicular to the axial center line of the camera lens, the y-axis is arranged along the vertical direction, and the z-axis is directed to the direction of the target object along the lens; and finally, mapping coordinates of the 3 key feature points automatically acquired by the Keypoint-RCNN neural network in an image coordinate system to coordinates in a space coordinate system.
The mapping process is as follows:
the coordinates of the key feature points in the image coordinate system are (P cx ,P cy ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, P cx Representing the pixel coordinates of the c-th key feature point in the x direction, P cy Representing pixel coordinates of the c-th key feature point in the y direction; the coordinates of the key feature points in the real space are (x c ,y c ,z c ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, x c Representing the spatial coordinates of the c-th key feature point on the x-axis, y c Representing the spatial coordinates of the c-th key feature point in the y direction, z c Representing the spatial coordinates of the c-th key feature point in the z direction;
the effective focal length EFL is calculated with reference to equation 7;
equation 7:wherein Fov is the camera built-in parameter field angle, and H is the pixel height of the shot image;
the transformation mapping P of key feature points from the image coordinate system to the spatial coordinate system c See formula 8;
equation 8:wherein, the subscript c represents the serial number of the key feature point, gamma c A scale factor representing a key feature point with a sequence number of c, wherein the unit is mm/pixel;
adopting a trust domain algorithm to solve the numerical solution of formula 8, setting the initial value of the trust domain algorithm between 0 and 1, and then starting to solve the gamma of the first frame of the demonstration video 1 、γ 2 And gamma 3 And taking the calculation result of the previous frame as the initial value of the new iteration solution of the current frame, and reciprocating in this way to obtain the space coordinates of all key feature points in all frames of the demonstration videoExpression is as in equation 9;
equation 9:wherein, the superscript indicates the frame number, the maximum frame number is j, the subscript indicates the sequence number of the key feature point, and each vertical line contains the space coordinates of 3 key feature points in a frame.
In this step, the unknown number gamma is solved c The actual length L between any two key feature points is combined to construct an equation set, and the equation set is shown as formulas 10-1, 10-2 and 10-3;
equation 10-1: l (L) 1-2 =||P 1 -P 2 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-2 For the actual length between key feature point 1 and key feature point 2, P 1 And P 2 See equation 8 for the expanded equation of (2);
equation 10-2: l (L) 1-3 =||P 1 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-3 For the actual length between key feature point 1 and key feature point 3, P 1 And P 3 See equation 8 for the expanded equation of (2);
equation 10-3: l (L) 2-3 =||P 2 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 2-3 For the actual length between key feature point 2 and key feature point 3, P 2 And P 3 See equation 8 for the expanded equation of (2);
by substituting equation 8 into equations 10-1, 10-2 and 10-3, respectively, equations 10-1, 10-2 and 10-3 can solve for γ c ,γ c Comprising gamma 1 、γ 2 And gamma 3
S06, mapping the space coordinates of the target object to a base coordinate system:
passing spatial coordinates of 3 key feature points in all frames of the demonstration video through a conversion matrixMapping to a base coordinate system to obtain a formula 10;
equation 10:wherein (1)>Spatial coordinates for 3 key feature points in all frames, +.>Base coordinates in a base coordinate system for 3 key feature points in all frames; the target position to be reached by the mechanical arm can be calculated based on the base coordinates, and then a control command is output (by a control system for controlling the mechanical arm to act) to control the mechanical arm to act so as to reach the target position.

Claims (5)

1. A method for controlling a mechanical arm to simulate video actions, which relates to the mechanical arm and a camera; the mechanical arm comprises a base, an arm section and a clamping jaw which are sequentially connected, and the clamping jaw is used for clamping a target object; the target object is marked as P; the camera is a depth camera;
the method is characterized by comprising the following steps:
s01, acquiring the position of a target object in a camera coordinate system:
calibrating a hand-eye system, establishing a coordinate conversion relation between a base and a camera, and determining a conversion matrix between a base coordinate system and a camera coordinate system by utilizing the relationSince the pose of the gripper in the base coordinate system can be derived from the state of the robot arm, it is assumed that the target object is at the gripper center point P in the base coordinate system Rob :(x Rob ,y Rob ,z Rob ) Then calculate the center point P of the clamping jaw in the basic coordinate system based on the formula 1 Rob Position P in camera coordinate system Dcam
Equation 1:
s02, acquiring pixel positions of a target object on an image:
after the camera is subjected to internal reference calibration, an internal reference Matrix shown in a formula 2 is obtained in
Equation 2:wherein f x C for the focal length of the camera along the X-axis of the captured image x To capture the optical center point of the X-axis of the image along the camera, f y C for the focal length of the camera along the Y-axis of the captured image y An optical center point along a Y-axis of a camera-captured image;
position P of target object from camera coordinate system Dcam Conversion to position P in a pixel coordinate System pixel The conversion process is based on equation 3;
equation 3: p (P) pixel =Matrix in ·P Dcam Wherein P is pixel Simply called a target object pixel coordinate point;
coordinate point P of target object pixel pixel Normalizing to obtain the pixel position P of the target object image As shown in equation 4;
equation 4:wherein, [0 ]]Index indicating first coordinate value, [1 ]]Index representing the second coordinate value, [2 ]]Index indicating the third coordinate value, P pixel [0]Representing the X-axis coordinate value, P, of the target object in the pixel coordinate system pixel [1]Representing Y-axis coordinate values, P, of a target object in a pixel coordinate system pixel [2]A Z-axis coordinate value representing the target object in a pixel coordinate system;
s03, extracting key feature points of a target object from the image:
firstly, establishing an arm segment end central point S to a target object pixel position P on a shooting image image Then screening out the point on the forward side of the extension vector, and defining a central region according to the position of the target regionDeleting the pixel points outside the central area, and finally screening out a target area according to a depth value range set by a target scene; extracting 3 key feature points of a target object from a target area to obtain positions of the 3 key feature points in a pixel coordinate system, wherein the 3 key feature points have relatively higher stability and discrimination degree compared with other common feature points which are not screened, and can be detected again in images with different visual angles and different illumination conditions shot by a camera;
in the step, the target area is an area where the target object is located;
s04, acquiring a data set and training a neural network:
recording the positions of the 3 key feature points in a pixel coordinate system and the positions of the 3 key feature points in a camera coordinate system, and then performing random operation on a target object by the mechanical arm, wherein the random operation comprises rotation and translation; the following two operations are performed out of order: i, through conversion matrixCalculating new positions of the clamping jaw center point and the 3 key feature points in a camera coordinate system after the operation is executed; II, acquiring a newly generated target area after operation; repeating the random operation for a plurality of times, and recording the positions of the 3 key feature points and the target area before and after each operation to form a data set for training a subsequent Keypoint-RCNN neural network; after training is completed, each time the mechanical arm clamping jaw in the demonstration video is observed to clamp any target object, the Keypoint-RCNN neural network can automatically acquire 3 key feature points, and the 3 key feature points are tracked and the positions of the 3 key feature points in a pixel coordinate system are predicted along with the playing of the demonstration video;
s05, mapping the gesture of the target object in the demonstration video to a real space:
preparing a demonstration video which needs to be imitated by a mechanical arm in advance, and clamping any article by a clamping jaw of the mechanical arm in the demonstration video to execute operation; the key point-RCNN neural network automatically acquires 3 key feature points from the demonstration video, tracks the 3 key feature points along with the playing of the demonstration video and predicts the positions of the 3 key feature points in a pixel coordinate system;
firstly, measuring the angle of view of a built-in parameter of a camera and demonstrating the actual length among key feature points in a first frame of a video as priori knowledge;
then, when the camera is horizontally placed, a space coordinate system is established, in the space coordinate system, the origin is the center of the camera lens, the x-axis is arranged along the horizontal direction and is perpendicular to the axial center line of the camera lens, the y-axis is arranged along the vertical direction, and the z-axis is directed to the direction of the target object along the lens; finally, mapping coordinates of 3 key feature points automatically acquired by the Keypoint-RCNN neural network in an image coordinate system to coordinates in a space coordinate system;
s06, mapping the space coordinates of the target object to a base coordinate system:
passing spatial coordinates of 3 key feature points in all frames of the demonstration video through a conversion matrixMapping to a base coordinate system to obtain a formula 10;
equation 10:wherein (1)>For the spatial coordinates of 3 key feature points in all frames,base coordinates for 3 key feature points in all frames; the target position to be reached by the mechanical arm can be calculated based on the base coordinates, and then a control command is output to control the mechanical arm to act so as to reach the target position.
2. The method for controlling a robotic arm to simulate a video action of claim 1, wherein: in the step S01, the hand-eye system is a binary system consisting of a mechanical arm and a depth camera, the base coordinate system is based on a base, and the center point of the clamping jaw is the center of a minimum sphere capable of completely containing the clamping jaw.
3. The method for controlling a robotic arm to simulate a video action of claim 1, wherein: in step S05, the mapping process is as follows:
the coordinates of the key feature points in the image coordinate system are (P cx ,P cy ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, P cx Representing the pixel coordinates of the c-th key feature point in the x direction, P cy Representing pixel coordinates of the c-th key feature point in the y direction; the coordinates of the key feature points in the real space are (x c ,y c ,z c ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the subscript c represents the serial number of the key feature point, x c Representing the spatial coordinates of the c-th key feature point on the x-axis, y c Representing the spatial coordinates of the c-th key feature point in the y direction, z c Representing the spatial coordinates of the c-th key feature point in the z direction;
the effective focal length EFL is calculated with reference to equation 7;
equation 7:wherein Fov is the camera built-in parameter field angle, and H is the pixel height of the shot image;
the transformation mapping P of key feature points from the image coordinate system to the spatial coordinate system c See formula 8;
equation 8:wherein, the subscript c represents the serial number of the key feature point, gamma c A scale factor representing a key feature point with a sequence number of c, wherein the unit is mm/pixel;
adopting a trust domain algorithm to solve the numerical solution of formula 8, setting the initial value of the trust domain algorithm between 0 and 1, and starting to solve the first frame of the demonstration videoGamma of (2) 1 、γ 2 And gamma 3 And taking the calculation result of the previous frame as the initial value of the new iteration solution of the current frame, and reciprocating in this way to obtain the space coordinates of all key feature points in all frames of the demonstration videoExpression is as in equation 9;
equation 9:wherein, the superscript indicates the frame number, the maximum frame number is j, the subscript indicates the sequence number of the key feature point, and each vertical line contains the space coordinates of 3 key feature points in a frame.
4. A method of controlling a robotic arm to simulate video actions as claimed in claim 3, wherein: in the step S05, the unknown number gamma is solved c The actual length L between any two key feature points is combined to construct an equation set, and the equation set is shown as formulas 10-1, 10-2 and 10-3;
equation 10-1: l (L) 1-2 =||P 1 -P 2 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-2 For the actual length between key feature point 1 and key feature point 2, P 1 And P 2 See equation 8 for the expanded equation of (2);
equation 10-2: l (L) 1-3 =||P 1 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 1-3 For the actual length between key feature point 1 and key feature point 3, P 1 And P 3 See equation 8 for the expanded equation of (2);
equation 10-3: l (L) 2-3 =||P 2 -P 3 || 2 The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is 2-3 For the actual length between key feature point 2 and key feature point 3, P 2 And P 3 See equation 8 for the expanded equation of (2);
by substituting equation 8 into equations 10-1, 10-2 and 10-3, respectively, equations 10-1, 10-2 and 10-3 can solve for γ c ,γ c Comprising gamma 1 、γ 2 And gamma 3
5. The method for controlling a robotic arm to simulate a video action according to claim 4, wherein: in the S04 step, a scale-invariant feature transformation algorithm is applied, a plurality of feature points are extracted from a target area of a shot image, the positions of the key feature points in a pixel coordinate system are obtained at the same time, and then 3 key feature points with highest response values are further selected as main observation targets; the scale-invariant feature transformation algorithm comprises the following steps:
i, detecting the limit of a scale space: searching local maximum or minimum values in different scale spaces by constructing Gaussian differential scale spaces, wherein the extreme points are potential key feature points;
II, positioning key points: fine positioning is carried out on potential key feature points so as to remove points with relatively low contrast and response points of edges, thereby improving the stability and reliability of the key feature points;
and III, key point direction assignment: assigning one or more directions to each screened key feature point, so that the SIFT descriptor has rotation invariance, wherein the directions are determined by the directions of local image gradients;
IV, generating key point descriptors: and generating a descriptor for each screened key characteristic point, wherein the descriptor is a histogram of the gradient of the area around the key point.
CN202311571826.9A 2023-11-23 2023-11-23 Method for controlling mechanical arm to simulate video motion Pending CN117464683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311571826.9A CN117464683A (en) 2023-11-23 2023-11-23 Method for controlling mechanical arm to simulate video motion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311571826.9A CN117464683A (en) 2023-11-23 2023-11-23 Method for controlling mechanical arm to simulate video motion

Publications (1)

Publication Number Publication Date
CN117464683A true CN117464683A (en) 2024-01-30

Family

ID=89631268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311571826.9A Pending CN117464683A (en) 2023-11-23 2023-11-23 Method for controlling mechanical arm to simulate video motion

Country Status (1)

Country Link
CN (1) CN117464683A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111136659A (en) * 2020-01-15 2020-05-12 南京大学 Mechanical arm action learning method and system based on third person scale imitation learning
CN111203878A (en) * 2020-01-14 2020-05-29 北京航空航天大学 Robot sequence task learning method based on visual simulation
CN111300431A (en) * 2020-03-31 2020-06-19 山东大学 Cross-scene-oriented robot vision simulation learning method and system
CN112180720A (en) * 2020-09-08 2021-01-05 武汉大学 Fiber placement process parameter model construction method and system based on simulation learning
CN112509392A (en) * 2020-12-16 2021-03-16 复旦大学 Robot behavior teaching method based on meta-learning
CN112975968A (en) * 2021-02-26 2021-06-18 同济大学 Mechanical arm simulation learning method based on third visual angle variable main body demonstration video
CN113524194A (en) * 2021-04-28 2021-10-22 重庆理工大学 Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
KR20230134328A (en) * 2022-03-14 2023-09-21 한국전자통신연구원 Apparatus and method for teaching robot

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111203878A (en) * 2020-01-14 2020-05-29 北京航空航天大学 Robot sequence task learning method based on visual simulation
CN111136659A (en) * 2020-01-15 2020-05-12 南京大学 Mechanical arm action learning method and system based on third person scale imitation learning
CN111300431A (en) * 2020-03-31 2020-06-19 山东大学 Cross-scene-oriented robot vision simulation learning method and system
CN112180720A (en) * 2020-09-08 2021-01-05 武汉大学 Fiber placement process parameter model construction method and system based on simulation learning
CN112509392A (en) * 2020-12-16 2021-03-16 复旦大学 Robot behavior teaching method based on meta-learning
CN112975968A (en) * 2021-02-26 2021-06-18 同济大学 Mechanical arm simulation learning method based on third visual angle variable main body demonstration video
CN113524194A (en) * 2021-04-28 2021-10-22 重庆理工大学 Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
KR20230134328A (en) * 2022-03-14 2023-09-21 한국전자통신연구원 Apparatus and method for teaching robot

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高歌: "面向服务机器人示范学习的操作行为识别方法", 中国优秀硕士学位论文全文数据库 信息科技辑, 15 May 2019 (2019-05-15) *

Similar Documents

Publication Publication Date Title
CN113524194B (en) Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
CN107471218B (en) Binocular vision-based hand-eye coordination method for double-arm robot
CN109308693B (en) Single-binocular vision system for target detection and pose measurement constructed by one PTZ camera
JP6011102B2 (en) Object posture estimation method
CN112233181A (en) 6D pose recognition method and device and computer storage medium
CN111721259B (en) Underwater robot recovery positioning method based on binocular vision
CN109816730B (en) Workpiece grabbing method and device, computer equipment and storage medium
CN111553949B (en) Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN111476841B (en) Point cloud and image-based identification and positioning method and system
CN112598729B (en) Target object identification and positioning method integrating laser and camera
CN109033989B (en) Target identification method and device based on three-dimensional point cloud and storage medium
CN111627072A (en) Method and device for calibrating multiple sensors and storage medium
CN112067233B (en) Six-degree-of-freedom motion capture method for wind tunnel model
CN108961144A (en) Image processing system
CN111784655B (en) Underwater robot recycling and positioning method
CN115816460B (en) Mechanical arm grabbing method based on deep learning target detection and image segmentation
CN112775959A (en) Method and system for determining grabbing pose of manipulator and storage medium
JP2014029664A (en) Image comparison range generation method, positional orientation detection method, image comparison range generation device, positional orientation detection device, robot, robot system, image comparison range generation program and positional orientation detection program
CN112164112A (en) Method and device for acquiring pose information of mechanical arm
Jeon et al. Underwater object detection and pose estimation using deep learning
CN115213896A (en) Object grabbing method, system and equipment based on mechanical arm and storage medium
CN115629066A (en) Method and device for automatic wiring based on visual guidance
CN112801988A (en) Object grabbing pose detection method based on RGBD and deep neural network
CN116766194A (en) Binocular vision-based disc workpiece positioning and grabbing system and method
US20230150142A1 (en) Device and method for training a machine learning model for generating descriptor images for images of objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination