CN113326666A - Robot intelligent grabbing method based on convolutional neural network differentiable structure searching - Google Patents

Robot intelligent grabbing method based on convolutional neural network differentiable structure searching Download PDF

Info

Publication number
CN113326666A
CN113326666A CN202110802383.4A CN202110802383A CN113326666A CN 113326666 A CN113326666 A CN 113326666A CN 202110802383 A CN202110802383 A CN 202110802383A CN 113326666 A CN113326666 A CN 113326666A
Authority
CN
China
Prior art keywords
neural network
grabbing
optimization
calculation
robot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110802383.4A
Other languages
Chinese (zh)
Other versions
CN113326666B (en
Inventor
胡伟飞
焦清
邵金毅
王楚璇
刘振宇
谭建荣
刘飞香
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
China Railway Construction Heavy Industry Group Co Ltd
Original Assignee
Zhejiang University ZJU
China Railway Construction Heavy Industry Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, China Railway Construction Heavy Industry Group Co Ltd filed Critical Zhejiang University ZJU
Priority to CN202110802383.4A priority Critical patent/CN113326666B/en
Publication of CN113326666A publication Critical patent/CN113326666A/en
Application granted granted Critical
Publication of CN113326666B publication Critical patent/CN113326666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/28Fuselage, exterior or interior

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a robot intelligent grabbing method based on convolutional neural network differentiable structure searching, which comprises the steps of firstly constructing a training set and a verification set, then constructing a discrete chain type searching space, relaxing the discrete chain type searching space to be continuous, then establishing a gradient-based neural network double-layer optimization model to optimize a grabbing attitude neural network by taking the neural network calculation speed and precision as optimization targets, and finally obtaining a grabbing attitude generation network with optimal parameters. And inputting the new RGB-D image into the trained network to generate the optimal grabbing posture. According to the robot intelligent grabbing method, grabbing quality judgment and grabbing posture generation are completed through the full convolution neural network, the neural network calculation efficiency is rapidly improved, and the problem of overlarge calculation amount in the optimization process is solved.

Description

Robot intelligent grabbing method based on convolutional neural network differentiable structure searching
Technical Field
The invention relates to the field of robot grabbing algorithms, in particular to a robot intelligent grabbing method based on convolutional neural network differentiable structure searching.
Background
The robot intelligently grabs objects of unknown different shapes and colors by adopting a hand-eye robot system, and places the objects in a specified area. The core for realizing the intelligent grabbing of the robot is to obtain an effective grabbing gesture from an image or a digital model containing object color and shape information.
The existing robot intelligent grabbing method can be divided into a physical analysis method and an empirical model. The physical analysis method directly obtains the proper grabbing gesture from the three-dimensional model of the object through a mechanical analysis method, the calculation amount of the method is large, the constraints existing in many real worlds are simplified, and the problems of poor generalization effect and long calculation time exist. The empirical model method learns the grabbing mode of a specific object through a data set mainly based on a deep learning method, wherein the grabbing posture is predicted by processing a deep image of the object by using a convolutional neural network.
However, the robot intelligent capture puts higher identification precision and faster operation speed requirements on the convolutional neural network, and most of the neural networks applied to robot capture at present are designed manually by experts in the field of deep learning according to professional knowledge and experience, so that a large amount of computing resources and time cost are consumed.
Disclosure of Invention
Aiming at the defects of the prior art, the invention discloses a robot intelligent grabbing method based on convolutional neural network differentiable structure searching, which has the following specific technical scheme:
a robot intelligent grabbing method based on convolutional neural network differentiable structure searching comprises the following steps:
(1) constructing a training set and a verification set of a grabbing posture generation network, wherein the training set and the verification set respectively comprise network input and output, the network input is an RGB-D image, and the output is grabbing quality Q and grabbing angle corresponding to each pixel point
Figure BDA0003165172910000011
Grabbing the opening W;
(2) constructing a chain search space consisting of a plurality of nodes, and determining candidate convolution calculation operation among the nodes;
(3) relaxing the discrete chained search space to continuous;
(4) simultaneously taking the calculation speed and precision of the neural network as optimization targets, and establishing a gradient-based neural network double-layer optimization model, wherein the double-layer optimization model comprises inner-layer optimization and outer-layer optimization, and the inner layer optimization is to train all weight coefficients w of the neural network by adopting a training set; the outer layer is optimized according to all weight coefficients w of the trained neural network*Training a neural network operation variable alpha by adopting a verification set;
then selecting an operation variable alpha to form a convolutional neural network, and retraining the weight coefficient by using the training set to obtain a capture attitude generation network with optimal parameters;
(5) inputting an RGB-D image shot by a depth camera positioned at the tail end of the robot into a capture gesture generation network with optimal parameters, and outputting capture quality Q and capture angle corresponding to each pixel point with the same length and width with the input image
Figure BDA0003165172910000021
Grab and openThree single-channel characteristic images of degree W;
(6) and (5) selecting a pixel point with the largest grabbing quality Q from the image obtained in the step (5), taking the position of the pixel point as the central position of the grabbing frame, and controlling the robot and the mechanical claw to grab the object by the upper computer.
Further, the training set and the verification set in the step (1) are obtained based on the existing robot intelligent capturing RGB-D image given by the data set and the capturing frame for successfully capturing the object, and the capturing quality Q and the capturing angle of each pixel point are generated
Figure BDA0003165172910000022
Grasping the opening W, and carrying out the following preprocessing on the three characteristics:
equally dividing each grabbing frame into three parts along the grabbing width direction, filling grabbing quality q of one part positioned in the center to be 1, filling a rotation angle phi of each grabbing frame relative to the picture, and filling grabbing opening w; wherein phi is equal to
Figure BDA0003165172910000023
W is [0, 150 ]](ii) a The two portions located on both sides are filled with the grasping mass q of 0.
Furthermore, the chain search space is composed of a plurality of nodes, each node represents an intermediate result after calculation operation, the nodes are connected through a directed arrow line, and the directed arrow line represents all possible candidate neural network calculation operations; the neural network computing operation refers to the operation of computing by adopting convolution kernels with different sizes and convolution layers with different numbers between two nodes; the nodes are connected in a chain mode, so that the convergence speed of the optimization algorithm is accelerated by maximally utilizing computing resources.
Further, the step (3) of relaxing the discrete chain search space to be continuous is realized by the following steps:
assigning the operation among the original nodes to a normalized continuous variable alpha, and expressing the discrete operation by the continuous variable alpha; the specific calculation method is that each directional arrow line between nodes is multiplied by a corresponding variable alpha, and then the obtained results are summed to be used as a final calculation result, and the calculation formula is as follows:
Figure BDA0003165172910000024
wherein e is a natural logarithm, xjIs the output of the jth node and,
Figure BDA0003165172910000025
for the ith operating variable in the jth node,
Figure BDA0003165172910000026
is the ith operation equation in the jth node;
Figure BDA0003165172910000027
the number of operations contained in the jth node.
Further, the inner layer is optimized to adopt a training set to calculate a loss function to train all weight coefficients w of the neural network under the condition that an operation variable alpha is determined among all the neural network nodes;
the outer layer is optimized based on all weight coefficients w of the trained neural network*Adopting a verification set to calculate a loss function to train a neural network operation variable alpha;
the specific calculation is as in formulas (2) and (3), where formula (3) is an inner optimization function and formula (2) is an outer optimization function: in order to simultaneously take the calculation precision and the time as optimization targets, a delay factor is introduced into an outer-layer optimization function, and the delay factor adjusts a loss function by calculating the quotient of the current neural network floating point calculation number and the target neural network floating point calculation number
Figure BDA0003165172910000031
Figure BDA0003165172910000032
Wherein the content of the first and second substances,
Figure BDA0003165172910000033
a loss function calculated for the validation set;
Figure BDA0003165172910000034
a loss function calculated for the training set; f is an equation of the floating point calculation number of the neural network; m is a structure of a neural network obtained through discretization; t is a set target floating point calculation number; ω is a constant that controls the magnitude of the delay factor effect.
Further, in the step (4), in order to enable the outer function to determine a correct convergence gradient more quickly, the inner function is calculated and iterated to approach convergence, and then the outer function is updated; meanwhile, whether convergence occurs is judged by observing a loss set obtained by multiple iterations of the inner layer function, so that the optimization process is prevented from stopping at a local optimal solution; the convergence criterion of the inner layer function is defined as follows:
Figure BDA0003165172910000035
Figure BDA0003165172910000036
Figure BDA0003165172910000037
wherein the content of the first and second substances,
Figure BDA0003165172910000038
Figure BDA0003165172910000039
Figure BDA00031651729100000310
Figure BDA00031651729100000311
wherein N is the number of inner layer function calculations in each group, GkThe set of values is lost for the kth inner layer function,
Figure BDA00031651729100000312
is the mean of the kth inner layer function loss value set,
Figure BDA00031651729100000313
is the maximum value of the kth inner layer function penalty value set,
Figure BDA00031651729100000314
is the minimum value of the k-th inner layer function loss value set, wiThe weighting factor, ε, of the network obtained for the ith optimization iteration1Fluctuating convergence threshold, ε, for a set of loss functions2The threshold is the loss function set mean convergence threshold.
Further, in the step (4), after the optimization is completed, the operation variables α of the neural network are ranked, the only operation with the highest α value or the first operations with higher α values are selected to form a composite convolutional neural network, and the weight coefficient w is retrained by using the training set to obtain the trained neural network.
Further, the physical grabbing environment where the robot is located comprises a physical robot, two parallel adaptive clamping jaws, a depth camera and an object set to be grabbed; the two parallel self-adaptive clamping jaws and the depth camera are fixed at the tail end of the physical robot, and the relative positions of the two parallel self-adaptive clamping jaws and the depth camera are unchanged in the motion process; the two parallel self-adaptive clamping jaws are perpendicular to the grabbing plane.
The invention has the following beneficial effects:
(1) compared with other intelligent grabbing methods, the grabbing posture generation network provided by the invention avoids candidate posture sampling and candidate grabbing evaluation in a color-depth picture, grabbing quality judgment and grabbing posture generation are completed through a full convolution neural network, and the calculation efficiency of the neural network is rapidly improved.
(2) According to the invention, a chain type search framework is adopted, optimization of a discrete network structure is converted into continuous variable optimization by endowing operation variables among network nodes, a neural network double-layer optimization model for optimizing the network structure after training network weights is established, and the problem of overlarge calculated amount in the optimization process is solved.
(3) According to the method, the delay factor is introduced into the outer layer optimization function, so that the calculation speed of the neural network is also brought into an optimization target, the simultaneous optimization of precision and speed is realized, and the optimal capture is closer to the actual industrial scene.
Drawings
FIG. 1 is a schematic diagram of a robot smart grab of the present invention;
FIG. 2 is a schematic diagram of a physical grabbing environment;
FIG. 3 is a flow diagram of a grab decision network workflow;
FIG. 4 is a schematic diagram of a processing method for capturing and generating network training data;
FIG. 5 is a schematic diagram of grab generation network inputs and outputs.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and preferred embodiments, and the objects and effects of the present invention will become more apparent, it being understood that the specific embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention.
As shown in fig. 1, the physical environment required for intelligent grabbing of the present invention is a hand-eye robot, a two-finger parallel adaptive gripper, a depth camera, and a set of objects to be grabbed; the hand-eye robot and the two-finger parallel self-adaptive clamping jaw are main gripping executing mechanisms and are responsible for transmitting position and posture information to an upper computer; the depth camera is responsible for an upper computer to transmit point cloud information of a grabbed object. In this embodiment, the robot is a 6-axis cooperative robot, the depth camera is a camera capable of acquiring color pictures and 2.5D depth point cloud pictures, and the set of objects to be grabbed is one or more objects randomly placed on a horizontal plane in the working space of the robot. The depth camera is placed with eyes on the hand, i.e. the camera is fixed relative to the end of the robot. The robot can acquire the related posture of a tool coordinate system, and the position and the posture of the depth camera can be acquired through the hand-eye calibration from the camera coordinate system to the tool coordinate system, so that the posture and the working state of main hardware in the current physical environment are determined, and the related point cloud information of the object to be grabbed and placed is obtained.
As shown in fig. 2 and 3, the robot intelligent grabbing method based on the convolutional neural network differentiable structure search operates in an upper computer, the grabbing attitude neural network is constructed, the neural network is optimized through a gradient-based neural network double-layer optimization model, and both the calculation precision and the time are used as optimization targets.
The task of grabbing the pose generation network is to input the RGB image P produced by the same depth cameracAnd depth image PdAnd the object given on the picture can be identified and grabbed. In this embodiment, the gripping considered by the network is all perpendicular to the object placement plane, i.e. the object is placed on a horizontal plane and the gripper grips perpendicular to the horizontal plane. Will Pc(color picture, composed of RGB three channels) and PdAn RGB-D picture composed of (depth pictures, only one channel in depth) is called Ps
One capture in picture space, perpendicular to the horizontal plane, is defined by g ═ p, Φ, w, q. Where p is (u, v) which determines the pixel position of the grabbing, phi defines the rotation angle of the gripper around the vertical direction during grabbing, w defines the opening of the gripper jaw during grabbing, and q defines the grabbing quality. The larger the value of q, the greater the likelihood of successful grasping at that grasping position.
And the grabbing gesture generation network carries out grabbing prediction on each pixel point of the input picture, and gives a grabbing angle and a grabbing width required when the pixel point is grabbed, and the probability of successful grabbing of the pixel point. Capturing the generated network in FIG. 5The output three characteristic images G ═ { phi, W, Q }, epsilon RH×W×3The pixel value in each image (u, v) pixel represents the corresponding physical quantity to be captured, and these pixels, combined together, form the output of the network.
To achieve this, the data set needs to be processed to train the grab-generate network. In the cornellgraphing dataset open source dataset, capture color-depth images are given, and some capture frames are given for successful object capture. As shown in fig. 4, the source data set is preprocessed, that is, for the grabbing quality Q, the 1/3 portion of each grabbing frame along the grabbing width and the center is a suitable position for grabbing, and for this portion, the filling grabbing quality Q is 1, and the rotation angle phi of each grabbing frame relative to the picture is set to be equal to
Figure BDA0003165172910000051
Internal; the filling grabbing opening degree w is set to be [0, 150 ]](ii) a The other portions are considered as the places where the grasping is impossible, and the grasping quality q is set to 0. Similarly, in the generation of Φ, W, for the region with the grasping quality of 0, the grasping angle and the opening degree are also 0 accordingly, and it is no longer considered that grasping is possible.
The neural network structure optimization algorithm is based on gradient, and takes calculation precision and time as optimization targets at the same time, so that a neural network structure with both precision and speed is obtained.
The neural network structure optimization algorithm mainly relates to two aspects of a search space and a search algorithm. The search space is formed by connecting a plurality of nodes in a chain structure, each node represents an intermediate result after calculation operation, the nodes are connected through a directed arrow line, and the directed arrow line represents all possible candidate neural network calculation operations. In order to make the most of the convolution characteristics, the neural network computing operation refers to an operation of computing between two nodes by adopting convolution kernels with different sizes and convolution layers with different numbers; the nodes are connected in a chain mode, so that the convergence speed of the optimization algorithm is accelerated by maximally utilizing computing resources.
After the number of nodes and the candidate operation among the nodes are determined, the operation among the original nodes is endowed with a normalized variable alpha, so that the discrete operation is expressed by a continuous variable alpha, the relaxation and the continuity of a discrete search space are realized, and a gradient-based neural network structure double-layer optimization model can be established. The specific calculation method is that each directional arrow line between nodes is multiplied by a corresponding variable alpha, and then the obtained results are summed to be used as a final calculation result, and the calculation formula is as follows:
Figure BDA0003165172910000061
wherein e is a natural logarithm, xjIs the output of the jth node and,
Figure BDA0003165172910000062
for the ith operating variable in the jth node,
Figure BDA0003165172910000063
is the ith operation equation in the jth node;
Figure BDA0003165172910000064
the number of operations contained in the jth node.
The double-layer optimization model comprises an inner layer optimization and an outer layer optimization, wherein the inner layer optimization is that a training set is adopted to train all weight coefficients w of the neural network; the outer layer is optimized according to all weight coefficients w of the trained neural network*And training the neural network operation variable alpha by adopting the verification set.
The inner layer is optimized to adopt a training set to calculate a loss function to train all weight coefficients w of the neural network under the condition that an operation variable alpha is determined among all neural network nodes; the outer layer is optimized based on all weight coefficients w of the trained neural network*Adopting a verification set to calculate a loss function to train a neural network operation variable alpha; the specific calculation is shown as formula (2) and (3), wherein formula (3) is an inner layer optimization functionNumerical, equation (2) is the outer optimization function: in order to simultaneously take the calculation precision and the time as optimization targets, a delay factor is introduced into an outer-layer optimization function, and the delay factor adjusts a loss function by calculating the quotient of the current neural network floating point calculation number and the target neural network floating point calculation number
Figure BDA0003165172910000065
Figure BDA0003165172910000066
Wherein the content of the first and second substances,
Figure BDA0003165172910000067
a loss function calculated for the validation set;
Figure BDA0003165172910000068
a loss function calculated for the training set; f is an equation of the floating point calculation number of the neural network; m is a structure of a neural network obtained through discretization; t is a set target floating point calculation number; ω is a constant that controls the magnitude of the delay factor effect.
In addition, in order to enable the outer layer function to determine the correct convergence gradient more quickly, the inner layer function is calculated and iterated to be close to convergence, and then the outer layer function is updated; meanwhile, whether convergence occurs is judged by observing a loss set obtained by multiple iterations of the inner layer function, so that the optimization process is prevented from stopping at a local optimal solution; the convergence criterion of the inner layer function is defined as follows:
Figure BDA0003165172910000071
Figure BDA0003165172910000072
Figure BDA0003165172910000073
wherein the content of the first and second substances,
Figure BDA0003165172910000074
Figure BDA0003165172910000075
Figure BDA0003165172910000076
Figure BDA0003165172910000077
wherein N is the number of inner layer function calculations in each group, GkThe set of values is lost for the kth inner layer function,
Figure BDA0003165172910000078
is the mean of the kth inner layer function loss value set,
Figure BDA0003165172910000079
is the maximum value of the kth inner layer function penalty value set,
Figure BDA00031651729100000710
is the minimum value of the k-th inner layer function loss value set, wiThe weighting factor, ε, of the network obtained for the ith optimization iteration1Fluctuating convergence threshold, ε, for a set of loss functions2The threshold is the loss function set mean convergence threshold.
After the optimization is completed, the operation variables alpha of the neural network are sequenced, the only operation with the highest alpha value or the first operations with higher alpha values are selected to form a composite convolutional neural network, and the weight coefficient w is retrained by using a training set to obtain the trained neural network.
Then, an RGB-D image shot by a depth camera positioned at the tail end of the robot is input into a capture posture generation network with optimal parameters, and capture quality Q and capture angle corresponding to each pixel point with the same length and width with the input image are output
Figure BDA00031651729100000711
And (5) capturing three single-channel characteristic images of the opening W. And then selecting a pixel point with the maximum grabbing quality Q from the images, taking the position of the pixel point as the central position of the grabbing frame, and controlling the robot and the mechanical claw to grab the object by the upper computer. As shown in fig. 5.
In order to verify the superiority of the method, the method is adopted to grasp and compare with the existing neural network structure optimization algorithm (GA-RG) based on the evolutionary algorithm. The evolutionary algorithm is a typical discrete optimization algorithm, needs a large amount of optimization time, but can perform neural network structure optimization more randomly. Table 1 shows the comparison of DARTS-RG and GA-RG in robot smart grab, and it can be seen that the optimization time of the present invention is much shorter than the GA-RG algorithm in both cases where no delay factor is introduced and where no delay factor is introduced. And when no delay factor is introduced, the grabbing precision of the method is greater than that of the GA-RG algorithm.
TABLE 1 comparison of GA-RG and DARTS-RG Performance
Figure BDA00031651729100000712
Figure BDA0003165172910000081
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and although the invention has been described in detail with reference to the foregoing examples, it will be apparent to those skilled in the art that various changes in the form and details of the embodiments may be made and equivalents may be substituted for elements thereof. All modifications, equivalents and the like which come within the spirit and principle of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A robot intelligent grabbing method based on convolutional neural network differentiable structure searching is characterized by comprising the following steps:
(1) constructing a training set and a verification set of a grabbing posture generation network, wherein the training set and the verification set respectively comprise network input and output, the network input is an RGB-D image, and the output is grabbing quality Q and grabbing angle corresponding to each pixel point
Figure FDA0003165172900000011
Grabbing the opening W;
(2) constructing a chain search space consisting of a plurality of nodes, and determining candidate convolution calculation operation among the nodes;
(3) relaxing the discrete chained search space to continuous;
(4) simultaneously taking the calculation speed and precision of the neural network as optimization targets, and establishing a gradient-based neural network double-layer optimization model, wherein the double-layer optimization model comprises inner-layer optimization and outer-layer optimization, and the inner layer optimization is to train all weight coefficients w of the neural network by adopting a training set; the outer layer is optimized according to all weight coefficients w of the trained neural network*Training a neural network operation variable alpha by adopting a verification set;
then selecting an operation variable alpha to form a convolutional neural network, and retraining the weight coefficient by using the training set to obtain a capture attitude generation network with optimal parameters;
(5) inputting an RGB-D image shot by a depth camera positioned at the tail end of the robot into a capture gesture generation network with optimal parameters, and outputting capture quality Q and capture angle corresponding to each pixel point with the same length and width with the input image
Figure FDA0003165172900000012
Three single-channel characteristic diagrams of grabbing opening degree WLike this.
(6) And (5) selecting a pixel point with the largest grabbing quality Q from the image obtained in the step (5), taking the position of the pixel point as the central position of the grabbing frame, and controlling the robot and the mechanical claw to grab the object by the upper computer.
2. The method for intelligently grabbing by robot based on convolutional neural network differentiable structure search as claimed in claim 1, wherein the training set and the verification set in step (1) are obtained based on RGB-D images given by the existing data set for intelligently grabbing by robot and grabbing frames for successfully grabbing objects, and the grabbing quality Q and grabbing angle of each pixel point are generated
Figure FDA0003165172900000013
Grasping the opening W, and carrying out the following preprocessing on the three characteristics:
equally dividing each grabbing frame into three parts along the grabbing width direction, filling grabbing quality q of one part positioned in the center to be 1, filling a rotation angle phi of each grabbing frame relative to the picture, and filling grabbing opening w; wherein phi is equal to
Figure FDA0003165172900000014
W is [0, 150 ]](ii) a The two portions located on both sides are filled with the grasping mass q of 0.
3. The convolutional neural network differentiable structure search based robot intelligent grasping method according to claim 1, wherein the chain search space is composed of a plurality of nodes, each node represents an intermediate result after a calculation operation, the nodes are connected through an arrow line, and the arrow line represents all possible candidate neural network calculation operations; the neural network computing operation refers to the operation of computing by adopting convolution kernels with different sizes and convolution layers with different numbers between two nodes; the nodes are connected in a chain mode, so that the convergence speed of the optimization algorithm is accelerated by maximally utilizing computing resources.
4. The convolutional neural network differentiable structure search based robot intelligent grasping method according to claim 1, wherein the step (3) of relaxing the discrete chain search space to continuous is realized by the following steps:
assigning the operation among the original nodes to a normalized continuous variable alpha, and expressing the discrete operation by the continuous variable alpha; the specific calculation method is that each directional arrow line between nodes is multiplied by a corresponding variable alpha, and then the obtained results are summed to be used as a final calculation result, and the calculation formula is as follows:
Figure FDA0003165172900000021
wherein e is a natural logarithm, xjIs the output of the jth node and,
Figure FDA0003165172900000022
for the ith operating variable in the jth node,
Figure FDA0003165172900000023
is the ith operation equation in the jth node;
Figure FDA0003165172900000024
the number of operations contained in the jth node.
5. The method for robot intelligent grabbing based on convolutional neural network differentiable structure search of claim 1,
the inner layer is optimized to adopt a training set to calculate a loss function to train all weight coefficients w of the neural network under the condition that an operation variable alpha is determined among all neural network nodes;
the outer layer is optimized based on all weight coefficients w of the trained neural network*Training neural networks using validation set to compute loss functionsAn operating variable α;
the specific calculation is as in formulas (2) and (3), where formula (3) is an inner optimization function and formula (2) is an outer optimization function: in order to simultaneously take the calculation precision and the time as optimization targets, a delay factor is introduced into an outer-layer optimization function, and the delay factor adjusts a loss function by calculating the quotient of the current neural network floating point calculation number and the target neural network floating point calculation number
Figure FDA0003165172900000025
Figure FDA0003165172900000026
Wherein the content of the first and second substances,
Figure FDA0003165172900000029
a loss function calculated for the validation set;
Figure FDA0003165172900000028
a loss function calculated for the training set; f is an equation of the floating point calculation number of the neural network; m is a structure of a neural network obtained through discretization; t is a set target floating point calculation number; ω is a constant that controls the magnitude of the delay factor effect.
6. The convolutional neural network differentiable structure search based robot intelligent grasping method as claimed in claim 5, wherein in the step (4), in order to make the outer function determine the correct convergence gradient faster, the inner function calculation is iterated to approach the convergence, and then the outer function is updated; meanwhile, whether convergence occurs is judged by observing a loss set obtained by multiple iterations of the inner layer function, so that the optimization process is prevented from stopping at a local optimal solution; the convergence criterion of the inner layer function is defined as follows:
Figure FDA0003165172900000031
Figure FDA0003165172900000032
Figure FDA0003165172900000033
wherein the content of the first and second substances,
Figure FDA0003165172900000034
Figure FDA0003165172900000035
Figure FDA0003165172900000036
Figure FDA0003165172900000037
wherein N is the number of inner layer function calculations in each group, GkThe set of values is lost for the kth inner layer function,
Figure FDA0003165172900000038
is the mean of the kth inner layer function loss value set,
Figure FDA0003165172900000039
is the maximum value of the kth inner layer function penalty value set,
Figure FDA00031651729000000310
is the k inner layer functionMinimum value of the set of loss values, wiThe weighting factor, ε, of the network obtained for the ith optimization iteration1Fluctuating convergence threshold, ε, for a set of loss functions2The threshold is the loss function set mean convergence threshold.
7. The method for intelligently grabbing robots based on convolutional neural network differentiable structure search according to claim 6, wherein in the step (4), after the optimization is completed, the operation variables α of the neural network are ranked, the only operation with the highest α value or the first operations with higher α values are selected to form a composite convolutional neural network, and the weight coefficients w are retrained by using a training set to obtain the trained neural network.
8. The convolutional neural network differential structure searching robot intelligent grabbing method according to claim 1, wherein the physical grabbing environment where the robot is located comprises a physical robot, two parallel adaptive clamping jaws, a depth camera and an object set to be grabbed; the two parallel self-adaptive clamping jaws and the depth camera are fixed at the tail end of the physical robot, and the relative positions of the two parallel self-adaptive clamping jaws and the depth camera are unchanged in the motion process; the two parallel self-adaptive clamping jaws are perpendicular to the grabbing plane.
CN202110802383.4A 2021-07-15 2021-07-15 Robot intelligent grabbing method based on convolutional neural network differentiable structure searching Active CN113326666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110802383.4A CN113326666B (en) 2021-07-15 2021-07-15 Robot intelligent grabbing method based on convolutional neural network differentiable structure searching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110802383.4A CN113326666B (en) 2021-07-15 2021-07-15 Robot intelligent grabbing method based on convolutional neural network differentiable structure searching

Publications (2)

Publication Number Publication Date
CN113326666A true CN113326666A (en) 2021-08-31
CN113326666B CN113326666B (en) 2022-05-03

Family

ID=77426450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110802383.4A Active CN113326666B (en) 2021-07-15 2021-07-15 Robot intelligent grabbing method based on convolutional neural network differentiable structure searching

Country Status (1)

Country Link
CN (1) CN113326666B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510062A (en) * 2018-03-29 2018-09-07 东南大学 A kind of robot irregular object crawl pose rapid detection method based on concatenated convolutional neural network
WO2020075423A1 (en) * 2018-10-10 2020-04-16 ソニー株式会社 Robot control device, robot control method and robot control program
US20200164505A1 (en) * 2018-11-27 2020-05-28 Osaro Training for Robot Arm Grasping of Objects
CN111360862A (en) * 2020-02-29 2020-07-03 华南理工大学 Method for generating optimal grabbing pose based on convolutional neural network
US20210023720A1 (en) * 2018-12-12 2021-01-28 Cloudminds (Shenzhen) Robotics Systems Co., Ltd. Method for detecting grasping position of robot in grasping object
CN112297013A (en) * 2020-11-11 2021-02-02 浙江大学 Robot intelligent grabbing method based on digital twin and deep neural network
EP3812107A1 (en) * 2019-10-21 2021-04-28 Canon Kabushiki Kaisha Robot control device, and method and program for controlling the same

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510062A (en) * 2018-03-29 2018-09-07 东南大学 A kind of robot irregular object crawl pose rapid detection method based on concatenated convolutional neural network
WO2020075423A1 (en) * 2018-10-10 2020-04-16 ソニー株式会社 Robot control device, robot control method and robot control program
US20200164505A1 (en) * 2018-11-27 2020-05-28 Osaro Training for Robot Arm Grasping of Objects
US20210023720A1 (en) * 2018-12-12 2021-01-28 Cloudminds (Shenzhen) Robotics Systems Co., Ltd. Method for detecting grasping position of robot in grasping object
EP3812107A1 (en) * 2019-10-21 2021-04-28 Canon Kabushiki Kaisha Robot control device, and method and program for controlling the same
CN111360862A (en) * 2020-02-29 2020-07-03 华南理工大学 Method for generating optimal grabbing pose based on convolutional neural network
CN112297013A (en) * 2020-11-11 2021-02-02 浙江大学 Robot intelligent grabbing method based on digital twin and deep neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘飞香: "管片拼装机抓取和拼装智能化研究", 《铁道建筑》 *
王斌等: "基于深度图像和深度学习的机器人抓取检测算法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN113326666B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
Karaoguz et al. Object detection approach for robot grasp detection
Sadeghi et al. Sim2real viewpoint invariant visual servoing by recurrent control
CN111079561B (en) Robot intelligent grabbing method based on virtual training
CN110298886B (en) Dexterous hand grabbing planning method based on four-stage convolutional neural network
CN110000785B (en) Agricultural scene calibration-free robot motion vision cooperative servo control method and equipment
JP6514171B2 (en) Machine learning apparatus and method for learning an optimal article gripping path
CN112297013B (en) Robot intelligent grabbing method based on digital twin and deep neural network
CN110125930B (en) Mechanical arm grabbing control method based on machine vision and deep learning
CN110785268B (en) Machine learning method and device for semantic robot grabbing
JP6964857B2 (en) Image recognition device, image recognition method, computer program, and product monitoring system
CN112605983B (en) Mechanical arm pushing and grabbing system suitable for intensive environment
CN110238840B (en) Mechanical arm autonomous grabbing method based on vision
JP6671694B1 (en) Machine learning device, machine learning system, data processing system, and machine learning method
CN113172629B (en) Object grabbing method based on time sequence tactile data processing
CN113341706B (en) Man-machine cooperation assembly line system based on deep reinforcement learning
CN112686282A (en) Target detection method based on self-learning data
CN111476771A (en) Domain self-adaptive method and system for generating network based on distance countermeasure
CN114789454B (en) Robot digital twin track completion method based on LSTM and inverse kinematics
Huang et al. Grasping novel objects with a dexterous robotic hand through neuroevolution
Yan et al. Learning probabilistic multi-modal actor models for vision-based robotic grasping
CN111300431B (en) Cross-scene-oriented robot vision simulation learning method and system
CN114387513A (en) Robot grabbing method and device, electronic equipment and storage medium
CN113326666B (en) Robot intelligent grabbing method based on convolutional neural network differentiable structure searching
CN111496794B (en) Kinematics self-grabbing learning method and system based on simulation industrial robot
Lu et al. Active pushing for better grasping in dense clutter with deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant