CN110238840B - Mechanical arm autonomous grabbing method based on vision - Google Patents

Mechanical arm autonomous grabbing method based on vision Download PDF

Info

Publication number
CN110238840B
CN110238840B CN201910335507.5A CN201910335507A CN110238840B CN 110238840 B CN110238840 B CN 110238840B CN 201910335507 A CN201910335507 A CN 201910335507A CN 110238840 B CN110238840 B CN 110238840B
Authority
CN
China
Prior art keywords
grabbing
image
label
function
capture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910335507.5A
Other languages
Chinese (zh)
Other versions
CN110238840A (en
Inventor
成慧
蔡俊浩
苏竟成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201910335507.5A priority Critical patent/CN110238840B/en
Publication of CN110238840A publication Critical patent/CN110238840A/en
Application granted granted Critical
Publication of CN110238840B publication Critical patent/CN110238840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • B25J9/161Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of robots, in particular to a mechanical arm automatic grabbing method based on vision. The corrective grabbing strategy based on the antagonistic grabbing rule is provided, and trial and error grabbing can be performed on the simulation platform by using the corrective grabbing strategy to obtain the grabbing samples meeting the rule. The sample acquired by the method clearly expresses the capture mode of the anti-capture rule, and is beneficial to the learning of the model. The whole data acquisition process does not need manual intervention or any real data, and the problem possibly brought by real data acquisition is avoided. Only a small amount of simulation data acquired by the method is needed, and the trained model can be directly applied to different real capturing scenes. The whole training process does not need domain self-adaptation and domain randomization operation, and the accuracy and the robustness are high.

Description

Mechanical arm autonomous grabbing method based on vision
Technical Field
The invention relates to the technical field of robots, in particular to a mechanical arm automatic grabbing method based on vision.
Background
Robot grabbing is mainly divided into two directions of an analysis method and an experience method. The analysis method generally refers to the construction of force closure grabbing based on the rules defined by four attributes of flexibility, balance, stability and dynamic certainty. This approach can generally be built into a constrained optimization problem. The empirical method is a data-driven method, and generally refers to extracting the feature representation of an object based on data, and then implementing a grabbing decision by using a set grabbing heuristic rule.
As deep learning has made tremendous progress in the field of computer vision, it has also begun to gain extensive attention and research in the field of robotics. A Learning to Grasp from 50K Tries and 700Robot homes. A robot trial-and-error grabbing mode is utilized to collect a 50000 grabbing data set, and a deep neural network is trained to realize the decision of grabbing angles. This method has 73% accuracy for unseen objects. Levine et at, left Hand-editing for a Robotic grading with Deep Learning and Large-Scale Data Collection. 800000 captured datasets were collected over two months using 6-14 robots and an evaluation model was trained using the datasets. The model can evaluate the action command according to the current scene to find out the optimal action command. This method can achieve 80% capture accuracy.
The methods have high capturing success rate, but the robots are required to capture and trial and error to acquire data. This approach is time and labor consuming and presents a significant safety hazard.
Mahler et al Dex-Net 2.0 Deep Learning to plant Robust scales with Synthetic Point clocks and analytical Grasp meters. And sampling object grabbing points in the simulation platform based on the anti-grabbing rule, and then obtaining sampling points with high robustness in a force closing mode. Based on the data obtained in the mode, a grabbing quality evaluation neural network is trained, and the grabbing success rate of the method on the countermeasures can be up to 93%. Although the method can have higher accuracy, the data size required by the training model is very large, and one reason of the method is that the acquired sample data does not clearly reflect the defined capture mode.
Disclosure of Invention
The invention provides a vision-based mechanical arm autonomous grasping method for overcoming at least one defect in the prior art, and data grasped in a simulation platform by the method is favorable for model learning.
In order to solve the technical problems, the invention adopts the technical scheme that: a mechanical arm autonomous grabbing method based on vision comprises the following steps:
s1, in a simulation environment, building an environment similar to a real scene, and collecting a global image;
s2, processing the data, wherein the preprocessed data comprise: the system comprises a global image containing the information of the whole working space, an object mask and a label graph with the same scale as the global image; the treatment process comprises the following steps: firstly, generating an object mask according to a position set of pixels where an object is located in an image, then generating a label mask according to the object mask, a capture pixel position and a capture label, and generating a label graph by using the capture position and the capture label; then discretizing the grabbing angle according to the grabbing problem definition;
s3, training a deep neural network:
(1) normalizing the input RGB images, and then synthesizing a batch;
(2) transmitting the batch of data into a full convolution neural network to obtain an output value;
(3) calculating the error between the predicted value and the label according to the cross entropy error combined with the label mask, and calculating by the following loss function:
Figure GDA0002683904630000021
wherein Y is a label image, M is a label mask, H and W are respectively the length and width of the label image, i, j and k are respectively index subscripts of positions in the 3-channel image, l is an index of the number of channels,
Figure GDA0002683904630000022
an output characteristic diagram representing the last convolutional layer;
Figure GDA0002683904630000023
representing a real number domain, wherein the corresponding superscript represents the dimension of the tensor;
and S4, applying the trained model to a real grabbing environment.
The invention provides a mechanical arm autonomous grabbing method based on vision, which trains an end-to-end deep neural network capable of realizing pixel-level grabbing prediction by acquiring a small amount of grabbing data in a simulation environment, and a learned model can be directly applied to a real grabbing scene. The whole process does not need to use domain self-adaption and domain randomization operation, and does not need any data collected by a real environment.
Further, the step S1 specifically includes:
s11, placing a background texture, a mechanical arm with a gripper, a camera and an object to be grabbed in a working space of a simulation environment;
s12, placing an object in a working space, selecting a position where the object exists by using a camera, recording image information, a pixel position corresponding to a grabbing point, a mask of the object in the image and a grabbing angle, and then randomly selecting an angle to allow the mechanical arm to perform trial-and-error grabbing;
s13, judging whether the grabbing is successful, and if the grabbing is failed, directly storing the image I, the position set C of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the grabbing failed label l; if the grabbing is successful, the global image I 'and the corresponding position set C' of the pixel where the object is located in the image are recorded again, and then the image I ', the position set C' of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l which is successfully grabbed are stored.
Further, the definition of the grabbing problem comprises: defining the vertical plane grasp as g ═ (p, ω, η), where p ═ x, y, z denotes the position of the grasp point in cartesian coordinates, ω ∈ [0,2 π) denotes the rotation angle of the terminal,
Figure GDA0002683904630000031
is a 3-dimensional one-bit effective code used for representing the grabbing function; the grabbing function is divided into three types, namely, grippable function, non-grippable function and background function; when projected into image space, capture at image I may be represented as
Figure GDA0002683904630000032
Wherein
Figure GDA0002683904630000033
Indicating the position of the grasp in the image,
Figure GDA0002683904630000034
representing a discrete grabbing angle; each pixel in the image may define a capture function, so the entire capture function graph may be represented as:
Figure GDA0002683904630000035
wherein
Figure GDA0002683904630000036
A capture function graph for an image at a given ith angle; in the figure, 3 channels respectively represent three categories of graspable, non-graspable and background; from each grab function graph CiIn the first channel
Figure GDA0002683904630000037
And are combined together to form
Figure GDA0002683904630000038
Figure GDA0002683904630000039
Representing the real domain, the corresponding superscript represents the dimension of the tensor.
Further, the most robust grab point is obtained by solving the following formula:
i*,h*,w*=argmaxi,h,wG(i,h,w)
where G (i, h, w) represents the confidence of the graspable function in the rotation angle and image position. (h)*,w*) For the position to be reached by the robot arm terminal in image space, i*Indicating terminal rotation
Figure GDA00026839046300000310
And then grabbing is performed.
Further, during the training process, a parameterized equation f is definedθAnd realizing the mapping of the image to the pixel level of the grabbing function graph, wherein the mapping can be expressed as:
Figure GDA00026839046300000311
in the formula (I), the compound is shown in the specification,
Figure GDA00026839046300000312
for image I rotate
Figure GDA00026839046300000313
The image after the degree of the image is,
Figure GDA00026839046300000314
is composed of
Figure GDA00026839046300000315
A corresponding grabbing function diagram; f. ofθImplemented with a deep neural network; in conjunction with the loss function, the overall training objective may be defined by the following equation:
Figure GDA0002683904630000041
wherein
Figure GDA0002683904630000042
A label graph is shown.
Further, considering a scene in which only one object is placed in the working space, c1 and c2 are defined as contact points of two fingers of the gripper and the object, n1 and n2 are corresponding normal vectors thereof, and g is defined as a grabbing direction of the gripper in the image space, wherein c1, c2, n1, n2,
Figure GDA0002683904630000043
By the above definition, it is possible to obtain:
Figure GDA0002683904630000044
wherein, | | · | | represents norm operation;
defining a fetch operation as a resistive fetch, when it satisfies the following condition:
Figure GDA0002683904630000045
Figure GDA0002683904630000046
Figure GDA0002683904630000047
wherein theta is1And theta2The non-negative values tend to be 0 and pi respectively, and represent the gripping direction and the threshold value of an included angle between the normal vectors of the surfaces of two contact points contacted with the object; wherein ω is1And ω2Two contact points for gripping direction and contact with objectThe included angle of the surface normal vector; when the gripper gripping direction is parallel to the normal vector of the contact point, the grip is defined as a stable counter grip.
All data are collected in the simulation platform, any real data are not needed, and the problem possibly caused by collecting the data in the real environment is avoided. In the simulation platform, the confrontation grabbing rules are added, so that the acquired data can effectively reflect the corresponding grabbing mode, and the trained model can be directly applied to a real grabbing scene only by a very small amount of grabbing data. The invention realizes the capture function prediction of the end-to-end pixel level by using the full-volume machine neural network. Each output pixel can capture the global information of the input image, which enables the model to learn more efficiently and make more accurate predictions.
Further, the step S4 includes:
s41, acquiring an RGB (red, green and blue) image and a depth image of a working space by using a camera;
s42, carrying out normalization processing on the RGB images, and rotating the RGB images by 16 angles to transmit the RGB images into the model to obtain 16 capture function images;
s43, according to the definition of the grabbing problem, the first channel of each functional diagram is taken and combined, the position corresponding to the maximum value is obtained, and the optimal grabbing position and grabbing angle in the image space can be obtained;
and S44, mapping the obtained image position to a 3-dimensional space, solving a mechanical arm control command according to inverse kinematics, rotating the tail end according to the grabbing angle after the image position reaches the position right above the object, and judging the descending height of the mechanical arm according to the collected depth map to avoid collision.
Further, the step S42 specifically includes: the image input into the full convolution neural network model comprises a global image of the whole space, firstly, Resnet50 is used as an encoder to extract features, then, a four-layer bilinear difference value and convolution sampling module is used, and finally, a 5x5 convolution is used for obtaining an input through-scale grabbing function diagram.
Compared with the prior art, the beneficial effects are:
1. the invention provides a corrective grabbing strategy based on an antagonistic grabbing rule, and trial and error grabbing can be carried out on a simulation platform by utilizing the strategy to obtain a grabbing sample according with the rule. The sample acquired by the method clearly expresses the capture mode of the anti-capture rule, and is beneficial to the learning of the model. The whole data acquisition process does not need manual intervention or any real data, and the problem possibly brought by real data acquisition is avoided.
2. Only a small amount of simulation data acquired by the method is needed, and the trained model can be directly applied to different real capturing scenes. The whole training process does not need domain self-adaptation and domain randomization operation, and the accuracy and the robustness are high.
3. A full convolution depth neural network is designed, the network inputs images containing the information of the whole working space, and the capturing function of each pixel point is output and predicted. The network structure of global input and pixel level prediction can learn corresponding grabbing modes faster and better.
Drawings
FIG. 1 is a diagram illustrating parameters defined in anti-snatching rules in a simulator according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a full convolution neural network of the present invention.
Detailed Description
The drawings are for illustration purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
Example 1:
defining the grabbing problem: defining the vertical plane grasp as g ═ (p, ω, η), where p ═ x, y, z denotes the position of the grasp point in cartesian coordinates, ω ∈ [0,2 π) denotes the rotation angle of the terminal,
Figure GDA0002683904630000061
is a 3-dimensional one-bit efficient code used to represent the grab function. The grabbing function is divided into three types, i.e. grippable function, non-grippable function and background function. When projected into image space, capture at image I may be represented as
Figure GDA0002683904630000062
Wherein
Figure GDA0002683904630000063
Indicating the position of the grasp in the image,
Figure GDA0002683904630000064
representing discrete grasping angles. Discretization can reduce the complexity of the learning process. Thus, each pixel in the image may define a capture function, so the entire capture function graph may be represented as:
Figure GDA0002683904630000065
wherein
Figure GDA0002683904630000066
The capture function graph of the image at the given ith angle is obtained. In the figure, 3 channels respectively represent three categories of graspable, non-graspable and background. From each grab function graph CiIn the first channel
Figure GDA0002683904630000067
(i.e., snatchable functional channel) and combined together
Figure GDA0002683904630000068
Figure GDA0002683904630000069
Thus, the most robust grab point can be obtained by solving the following equation:
i*,h*,w*=argmaxi,h,wG(i,h,w)
wherein G (i, h, w) represents the rotation angle and the image positionConfidence of the grab function. (h)*,w*) For the position to be reached by the robot arm terminal in image space, i*Indicating terminal rotation
Figure GDA00026839046300000610
And then grabbing is performed.
During the training process, a parameterized equation f is definedθAnd realizing the mapping of the image to the pixel level of the grabbing function graph, wherein the mapping can be expressed as:
Figure GDA00026839046300000611
Figure GDA00026839046300000612
for image I rotate
Figure GDA00026839046300000613
The image after the degree of the image is,
Figure GDA00026839046300000614
is composed of
Figure GDA00026839046300000615
And (5) corresponding grabbing function diagrams.
fθMay be implemented with a deep neural network; for example, learning is performed by using a gradient descent method to obtain an expression of a function, data is input into a neural network to obtain a prediction output, the prediction output is compared with a real label to obtain an error, the error is propagated backwards to obtain a gradient of each parameter in the neural network, and finally the parameters are updated by using the gradients to make the output of the neural network closer to the real label, so that a specific expression of the function is obtained by learning.
In conjunction with the loss function, the overall training objective may be defined by the following equation:
Figure GDA00026839046300000616
wherein
Figure GDA00026839046300000617
A label graph is shown.
Collecting simulation data: the invention defines the rule of object fighting grabbing in the image space. Consider a scene in which only one object is placed in the workspace. C1 and c2 are defined as the contact points of the two fingers and the object, n1 and n2 are their corresponding normal vectors, and g is the direction of capture of the gripper in image space. c1, c2, n1, n2,
Figure GDA0002683904630000071
As shown in fig. 1; by the above definition, it is possible to obtain:
Figure GDA0002683904630000072
| | · | | represents a norm operation. The invention defines a grabbing operation as a counter-grabbing when it satisfies the following condition:
Figure GDA0002683904630000073
Figure GDA0002683904630000074
Figure GDA0002683904630000075
wherein theta is1And theta2The non-negative values tend to be 0 and pi respectively, and represent the gripping direction and the threshold value of an included angle between the normal vectors of the surfaces of two contact points contacted with the object; omega1And ω2The included angle between the grabbing direction and the normal vector of the surfaces of two contact points contacted with the object is shown. In general, a gripper is defined as a stable pair when the direction of the gripper's grasp is parallel to the normal vector of the point of contactAnd (4) resisting grabbing.
In a practical implementation, the present invention uses a corrective grab strategy to achieve the collection of samples that satisfy the resistive grab rule. Firstly, a grabbing angle and a pixel position containing an object are randomly selected, and then the information of the whole working space is recorded by a camera. And then controlling the mechanical arm to perform trial and error grabbing, and if grabbing fails, storing the working space image I, the grabbing pixel position p, the set C of all pixel positions occupied by the object in the image, the grabbing angle psi and the label l. If the grabbing is successful, the position of the object is changed due to the contact of the gripper and the object, the grabbing direction of the gripper is approximately parallel to the normal vector of the contact point due to the correction change, the requirement for resisting the grabbing rule is met, at the moment, the camera records the corrected image I 'again, the pixel position C' of the object is obtained again, and then the image, the pixel position of the grabbing point, the set of all pixel positions occupied by the object in the image, the grabbing angle and the label are stored.
Defining a network structure:
the network structure is shown in fig. 2. The method adopts a full-volume machine neural network, inputs the global image containing the whole working space, firstly uses Resnet50 as an encoder to extract features, then uses an up-sampling module with four layers of bilinear interpolation and convolution, and optimally uses a 5x5 convolution to obtain a grabbing function graph with the input through scale.
Defining a loss function:
because most pixels in an image belong to the background class and the graspable and non-graspable labels are very sparse, training directly with such data can be very inefficient. The invention therefore proposes to calculate the loss function in combination with a label mask. For pixels belonging to the object but not subjected to trial-and-error capture, the value of the pixel at the position corresponding to the label mask is set as
Figure GDA0002683904630000081
For other pixels, the value of the position corresponding to the label mask is set as
Figure GDA0002683904630000082
Is provided with
Figure GDA0002683904630000083
The output characteristic diagram of the last convolutional layer is shown. The corresponding loss function is therefore:
Figure GDA0002683904630000084
Figure GDA0002683904630000085
indicating the label graph corresponding to the sample, H and W are the length and width of the label graph respectively, i, j and k are index subscripts of the position in the 3-channel image respectively, l is the index of the channel number,
Figure GDA0002683904630000086
an output characteristic diagram representing the last convolutional layer;
Figure GDA0002683904630000087
representing the real domain, the corresponding superscript represents the dimension of the tensor.
In order to reduce the influence caused by label sparsity, the invention increases the loss weight of grippable and non-grippable, and reduces the loss weight of the background. For both grippable and non-grippable labels, the position of their mask is multiplied by 120, while the background area is multiplied by 0.1.
The method comprises the following specific implementation steps:
step 1: in the simulation environment, an environment similar to a real scene is built.
Step 1.1: a background texture, a mechanical arm with a gripper, a camera and an object to be grabbed are placed in a working space of the simulation environment.
Step 1.2: the method comprises the steps of placing an object in a working space, selecting a position where the object exists by using a camera, recording image information, a pixel position corresponding to a grabbing point, a mask of the object in an image and a grabbing angle, and then randomly selecting an angle to enable a mechanical arm to perform trial-and-error grabbing.
Step 1.3: and judging whether the grabbing is successful, if the grabbing is failed, directly storing the image I, the position set C of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l of the grabbing failure. If the grabbing is successful, the global image I 'and the corresponding position set C' of the pixel where the object is located in the image are recorded again, and then the image I ', the position set C' of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l which is successfully grabbed are stored. The acquired global image is the global image defined by the grabbing problem in the invention content, and the grabbing angle and the grabbing position are also defined in the image space.
Step 2: the data is pre-processed.
Step 2.1: generating an object mask according to the position set of the pixel where the object is located in the image, generating a label mask according to the object mask, the pixel grabbing position and the label grabbing position, and generating a label image by using the grabbing position and the label grabbing position. For the label mask, the weights belonging to the grippable and non-grippable regions are increased, and the weight of the background is decreased.
Step 2.2: and discretizing the grabbing angle according to the problem definition. In this step, the image is rotated by 16 degrees, and the corresponding label and label mask are also rotated by 16 degrees, because only horizontal grabbing is considered, only data in which the grabbing direction is parallel to the horizontal direction after rotation is retained.
Step 2.3: the preprocessed data includes: the system comprises a global image containing the information of the whole working space, an object mask and a label graph with the same scale as the global image.
And step 3: and training the deep neural network.
Step 3.1: the input RGB maps are normalized and then a batch (batch) is synthesized.
Step 3.2: the batch of data is transmitted to a full convolution neural network defined in the invention content to obtain an output value.
Step 3.3: and calculating the error between the predicted value and the label according to the cross entropy error combined with the label mask, wherein the calculation formula is as follows:
Figure GDA0002683904630000091
wherein Y is a label image, M is a label mask, H and W are respectively the length and width of the label image, i, j and k are respectively index subscripts of positions in the 3-channel image, l is an index of the number of channels,
Figure GDA0002683904630000092
an output characteristic diagram representing the last convolutional layer;
Figure GDA0002683904630000093
representing the real domain, the corresponding superscript represents the dimension of the tensor.
And 4, step 4: and applying the trained model to a real grabbing environment.
Step 4.1: and acquiring an RGB (red, green and blue) map and a depth map of the working space by using the camera.
Step 4.2: and (3) carrying out normalization processing on the RGB image, and rotating 16 angles to transmit into the full convolution neural network model to obtain 16 capture function images.
Step 4.3: according to the definition of the grabbing problem, the first channel of each functional diagram is taken and combined, the position corresponding to the maximum value is obtained, and the optimal grabbing position and grabbing angle in the image space can be obtained.
Step 4.4: and mapping the obtained image position to a 3-dimensional space, solving a mechanical arm control command according to inverse kinematics, rotating the tail end according to a grabbing angle after the tail end reaches the position right above the object, and judging the descending height of the mechanical arm according to the acquired depth map to avoid collision.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (7)

1. A mechanical arm autonomous grabbing method based on vision is characterized by comprising the following steps:
s1, in a simulation environment, building an environment similar to a real scene, and collecting a global image;
s2, processing the data, wherein the preprocessed data comprise: the system comprises a global image containing the information of the whole working space, an object mask and a label graph with the same scale as the global image; the treatment process comprises the following steps: firstly, generating an object mask according to a position set of pixels where an object is located in an image, then generating a label mask according to the object mask, a capture pixel position and a capture label, and generating a label graph by using the capture position and the capture label; then discretizing the grabbing angle according to the grabbing problem definition;
the definition of the grabbing problem comprises the following steps: defining the vertical plane grasp as g ═ (p, ω, η), where p ═ x, y, z denotes the position of the grasp point in cartesian coordinates, ω ∈ [0,2 π) denotes the rotation angle of the terminal,
Figure FDA0002747242400000011
Figure FDA0002747242400000012
is a 3-dimensional one-bit effective code used for representing the grabbing function; the grabbing function is divided into three types, namely, grippable function, non-grippable function and background function; when projected into image space, capture at image I may be represented as
Figure FDA0002747242400000013
Figure FDA0002747242400000014
Wherein
Figure FDA0002747242400000015
Indicating the position of the grasp in the image,
Figure FDA0002747242400000016
representing a discrete grabbing angle; each pixel in the image may define a capture function, so the entire capture function graph may be represented as:
Figure FDA0002747242400000017
wherein
Figure FDA0002747242400000018
A capture function graph for an image at a given ith angle; in the figure, 3 channels respectively represent three categories of graspable, non-graspable and background; from each grab function graph CiIn the first channel
Figure FDA0002747242400000019
And are combined together to form
Figure FDA00027472424000000110
Figure FDA00027472424000000111
S3, training a deep neural network:
(1) normalizing the input RGB images, and then synthesizing a batch;
(2) transmitting the batch of data into a full convolution neural network to obtain an output value;
(3) calculating the error between the predicted value and the label according to the cross entropy error combined with the label mask, and calculating by the following loss function:
Figure FDA00027472424000000112
wherein Y is a label image, M is a label mask, H and W are the length and width of the label image, respectively, and i, j and k are in the 3-channel image, respectivelyIndex subscript of the position, l is the index of the channel number,
Figure FDA00027472424000000113
an output characteristic diagram representing the last convolutional layer;
Figure FDA0002747242400000029
representing a real number domain, wherein the corresponding superscript represents the dimension of the tensor;
and S4, applying the trained model to a real grabbing environment.
2. The vision-based mechanical arm autonomous grasping method according to claim 1, wherein the step S1 specifically includes:
s11, placing a background texture, a mechanical arm with a gripper, a camera and an object to be grabbed in a working space of a simulation environment;
s12, placing an object in a working space, selecting a position where the object exists by using a camera, recording image information, a pixel position corresponding to a grabbing point, a mask of the object in the image and a grabbing angle, and then randomly selecting an angle to allow the mechanical arm to perform trial-and-error grabbing;
s13, judging whether the grabbing is successful, and if the grabbing is failed, directly storing the image I, the position set C of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the grabbing failed label l; if the grabbing is successful, the global image I 'and the corresponding position set C' of the pixel where the object is located in the image are recorded again, and then the image I ', the position set C' of the pixel where the object is located in the image, the pixel position p corresponding to the grabbing point, the grabbing angle psi and the label l which is successfully grabbed are stored.
3. The vision-based mechanical arm autonomous grasping method according to claim 2, wherein the most robust grasping point is obtained by solving the following formula:
i*,h*,w*=argmaxi,h,wG(i,h,w)
wherein G (i, h, w) represents the confidence of the graspable function in the rotation angle and the image position; (h)*,w*) For the position to be reached by the robot arm terminal in image space, i*Indicating terminal rotation
Figure FDA0002747242400000021
And then grabbing is performed.
4. The vision-based mechanical arm autonomous grasping method according to claim 3, wherein in the training process, a parameterized equation f is definedθAnd realizing the mapping of the image to the pixel level of the grabbing function graph, wherein the mapping can be expressed as:
Figure FDA0002747242400000022
in the formula (I), the compound is shown in the specification,
Figure FDA0002747242400000023
for image I rotate
Figure FDA0002747242400000024
The image after the degree of the image is,
Figure FDA0002747242400000025
is composed of
Figure FDA0002747242400000026
A corresponding grabbing function diagram; f. ofθImplemented with a deep neural network; in conjunction with the loss function, the overall training objective may be defined by the following equation:
Figure FDA0002747242400000027
wherein
Figure FDA0002747242400000028
A label graph is shown.
5. The vision-based mechanical arm automatic grabbing method of claim 4, wherein considering a scene that only one object is placed in a working space, c1 and c2 are defined as contact points of two fingers of a gripper and the object, n1 and n2 are corresponding normal vectors thereof, and g is defined as a grabbing direction of the gripper in an image space, wherein c1, c2, n1, n2, n 3538,
Figure FDA0002747242400000031
By the above definition, it is possible to obtain:
Figure FDA0002747242400000032
wherein, | | · | | represents norm operation;
defining a fetch operation as a resistive fetch, when it satisfies the following condition:
Figure FDA0002747242400000033
Figure FDA0002747242400000034
Figure FDA0002747242400000035
wherein theta is1And theta2The non-negative values tend to be 0 and pi respectively, and represent the gripping direction and the threshold value of an included angle between the normal vectors of the surfaces of two contact points contacted with the object; wherein ω is1And ω2The included angle between the grabbing direction and the normal vector of the surfaces of two contact points contacted with the object is included; when the gripper grips in the direction normal to the contact pointWhen the quantities are parallel, the capture is defined as a stable counter-capture.
6. The vision-based robotic arm autonomous grasping method according to claim 5, wherein the step S4 includes:
s41, acquiring an RGB (red, green and blue) image and a depth image of a working space by using a camera;
s42, carrying out normalization processing on the RGB image, and rotating 16 angles to transmit the RGB image into a full convolution neural network model to obtain 16 capture function images;
s43, according to the definition of the grabbing problem, the first channel of each functional diagram is taken and combined, the position corresponding to the maximum value is obtained, and the optimal grabbing position and grabbing angle in the image space can be obtained;
and S44, mapping the obtained image position to a 3-dimensional space, solving a mechanical arm control command according to inverse kinematics, rotating the tail end according to the grabbing angle after the image position reaches the position right above the object, and judging the descending height of the mechanical arm according to the collected depth map to avoid collision.
7. The vision-based mechanical arm autonomous grasping method according to claim 6, wherein the step S42 specifically includes: the image input into the full convolution neural network model comprises a global image of the whole space, firstly, Resnet50 is used as an encoder to extract features, then, a four-layer bilinear difference value and convolution sampling module is used, and finally, a 5x5 convolution is used for obtaining an input through-scale grabbing function diagram.
CN201910335507.5A 2019-04-24 2019-04-24 Mechanical arm autonomous grabbing method based on vision Active CN110238840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910335507.5A CN110238840B (en) 2019-04-24 2019-04-24 Mechanical arm autonomous grabbing method based on vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910335507.5A CN110238840B (en) 2019-04-24 2019-04-24 Mechanical arm autonomous grabbing method based on vision

Publications (2)

Publication Number Publication Date
CN110238840A CN110238840A (en) 2019-09-17
CN110238840B true CN110238840B (en) 2021-01-29

Family

ID=67883271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910335507.5A Active CN110238840B (en) 2019-04-24 2019-04-24 Mechanical arm autonomous grabbing method based on vision

Country Status (1)

Country Link
CN (1) CN110238840B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889460B (en) * 2019-12-06 2023-05-23 中山大学 Mechanical arm specified object grabbing method based on cooperative attention mechanism
CN111127548B (en) * 2019-12-25 2023-11-24 深圳市商汤科技有限公司 Grabbing position detection model training method, grabbing position detection method and grabbing position detection device
CN111325795B (en) * 2020-02-25 2023-07-25 深圳市商汤科技有限公司 Image processing method, device, storage medium and robot
CN111590577B (en) * 2020-05-19 2021-06-15 台州中盟联动企业管理合伙企业(有限合伙) Mechanical arm multi-parameter digital frequency conversion control method and device
CN112465825A (en) * 2021-02-02 2021-03-09 聚时科技(江苏)有限公司 Method for acquiring spatial position information of part based on image processing
CN116197887B (en) * 2021-11-28 2024-01-30 梅卡曼德(北京)机器人科技有限公司 Image data processing method, device, electronic equipment and storage medium for generating grabbing auxiliary image
CN114407011B (en) * 2022-01-05 2023-10-13 中科新松有限公司 Special-shaped workpiece grabbing planning method, planning device and special-shaped workpiece grabbing method

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7076313B2 (en) * 2003-06-06 2006-07-11 Visteon Global Technologies, Inc. Method for optimizing configuration of pick-and-place machine
KR101211601B1 (en) * 2010-11-05 2012-12-12 한국과학기술연구원 Motion Control System and Method for Grasping Object with Dual Arms of Robot
US8843236B2 (en) * 2012-03-15 2014-09-23 GM Global Technology Operations LLC Method and system for training a robot using human-assisted task demonstration
US20150314439A1 (en) * 2014-05-02 2015-11-05 Precision Machinery Research & Development Center End effector controlling method
US10394327B2 (en) * 2014-09-12 2019-08-27 University Of Washington Integration of auxiliary sensors with point cloud-based haptic rendering and virtual fixtures
CN109074513B (en) * 2016-03-03 2020-02-18 谷歌有限责任公司 Deep machine learning method and device for robot gripping
JP2018051704A (en) * 2016-09-29 2018-04-05 セイコーエプソン株式会社 Robot control device, robot, and robot system
CN106874914B (en) * 2017-01-12 2019-05-14 华南理工大学 A kind of industrial machinery arm visual spatial attention method based on depth convolutional neural networks
CN106846463B (en) * 2017-01-13 2020-02-18 清华大学 Microscopic image three-dimensional reconstruction method and system based on deep learning neural network
CN106914897A (en) * 2017-03-31 2017-07-04 长安大学 Inverse Solution For Manipulator Kinematics method based on RBF neural
CN109407603B (en) * 2017-08-16 2020-03-06 北京猎户星空科技有限公司 Method and device for controlling mechanical arm to grab object
CN108161934B (en) * 2017-12-25 2020-06-09 清华大学 Method for realizing robot multi-axis hole assembly by utilizing deep reinforcement learning
CN108415254B (en) * 2018-03-12 2020-12-11 苏州大学 Waste recycling robot control method based on deep Q network
CN109483534B (en) * 2018-11-08 2022-08-02 腾讯科技(深圳)有限公司 Object grabbing method, device and system

Also Published As

Publication number Publication date
CN110238840A (en) 2019-09-17

Similar Documents

Publication Publication Date Title
CN110238840B (en) Mechanical arm autonomous grabbing method based on vision
CN108491880B (en) Object classification and pose estimation method based on neural network
Cao et al. Suctionnet-1billion: A large-scale benchmark for suction grasping
CN108280856B (en) Unknown object grabbing pose estimation method based on mixed information input network model
CN111079561B (en) Robot intelligent grabbing method based on virtual training
DE202017106506U1 (en) Device for deep machine learning to robot grip
CN111331607B (en) Automatic grabbing and stacking method and system based on mechanical arm
Tang et al. Learning collaborative pushing and grasping policies in dense clutter
CN110969660A (en) Robot feeding system based on three-dimensional stereoscopic vision and point cloud depth learning
CN113172629A (en) Object grabbing method based on time sequence tactile data processing
CN113762159B (en) Target grabbing detection method and system based on directional arrow model
CN115861780B (en) Robot arm detection grabbing method based on YOLO-GGCNN
CN114998573B (en) Grabbing pose detection method based on RGB-D feature depth fusion
CN113752255A (en) Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning
CN110171001A (en) A kind of intelligent sorting machinery arm system based on CornerNet and crawl control method
CN115147488A (en) Workpiece pose estimation method based on intensive prediction and grasping system
CN116214524A (en) Unmanned aerial vehicle grabbing method and device for oil sample recovery and storage medium
CN114131603B (en) Deep reinforcement learning robot grabbing method based on perception enhancement and scene migration
CN113681552B (en) Five-dimensional grabbing method for robot hybrid object based on cascade neural network
Ito et al. Integrated learning of robot motion and sentences: Real-time prediction of grasping motion and attention based on language instructions
Wu et al. A cascaded CNN-based method for monocular vision robotic grasping
CN114211490A (en) Robot arm gripper pose prediction method based on Transformer model
CN113894058A (en) Quality detection and sorting method and system based on deep learning and storage medium
CN110889460B (en) Mechanical arm specified object grabbing method based on cooperative attention mechanism
Li et al. Grasping Detection Based on YOLOv3 Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant