CN110717532A - Real-time detection method for robot target grabbing area based on SE-RetinaGrasp model - Google Patents

Real-time detection method for robot target grabbing area based on SE-RetinaGrasp model Download PDF

Info

Publication number
CN110717532A
CN110717532A CN201910925919.4A CN201910925919A CN110717532A CN 110717532 A CN110717532 A CN 110717532A CN 201910925919 A CN201910925919 A CN 201910925919A CN 110717532 A CN110717532 A CN 110717532A
Authority
CN
China
Prior art keywords
grabbing
feature
model
real
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910925919.4A
Other languages
Chinese (zh)
Inventor
卢智亮
曾碧
林伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910925919.4A priority Critical patent/CN110717532A/en
Publication of CN110717532A publication Critical patent/CN110717532A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a real-time detection method for a robot target grabbing area based on an SE-RetinaGrasp model, which comprises the following steps: downloading a training data set through an interface and acquiring an image containing a target object of a robot grabbing target object through a visual sensor to construct the training data set; preprocessing images in a training data set; constructing a grabbing detection model by adopting a RetinaNet model and a SENet module; inputting the preprocessed training data set into a grabbing detection model, and training the grabbing detection model by adopting a transfer learning method and a random gradient descent method; and acquiring a robot target grabbing image to be detected in real time through a visual sensor, and inputting the grabbing detection model to obtain a target grabbing area detection image with a grabbing frame. The method can improve the prediction effect and the detection accuracy of the grabbing area, and effectively enhance the grabbing capacity of the model to the detailed information.

Description

Real-time detection method for robot target grabbing area based on SE-RetinaGrasp model
Technical Field
The invention relates to the technical field of robot grabbing, in particular to a robot target grabbing area real-time detection method based on an SE-RetinaGrasp model.
Background
In the field of intelligent robots, autonomous robot grabbing is a key capability of an intelligent robot. The existing methods applied to robot target grabbing area detection comprise a grabbing area detection method based on a sliding window detection frame, a global grabbing prediction method, a second-order grabbing detection method and the like.
The method for detecting the grabbing area based on the sliding window detection frame adopts a sliding window method, so that the time consumed for searching the grabbing area is long, the calculated amount is large, and the real-time requirement of grabbing and detecting of the robot cannot be met; the global grabbing prediction method is easy to cause an average effect, a predicted grabbing frame tends to the center of an object, and the prediction effect is not ideal for the situation that grabbing parts such as plates are the edges of the object; the second-order grabbing detection method achieves higher accuracy, but at the cost of consuming detection time, the requirement of real-time grabbing by the robot cannot be met, the predicted grabbing frame is large, the detection performance of a small grabbing frame is insufficient, and the accuracy of the grabbing frame needs to be improved.
Disclosure of Invention
The invention provides a robot target grabbing area real-time detection method based on an SE-RetinaGrasp model, aiming at overcoming the defect of low accuracy of the grabbing area prediction effect in the prior art.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the robot target grabbing area real-time detection method based on the SE-RetinaGrasp model comprises the following steps:
s1: downloading a training data set through an interface and acquiring an image containing a target object of a robot grabbing target object through a visual sensor to construct the training data set;
s2: preprocessing images in the training data set;
s3: constructing a grabbing detection model by adopting a RetinaNet model and a SENet module;
s4: inputting the preprocessed training data set into the grabbing detection model, and training the grabbing detection model by adopting a transfer learning method and a random gradient descent method;
s5: and acquiring a robot target grabbing image to be detected in real time through a visual sensor, and inputting the grabbing detection model to obtain a target grabbing area detection image with a grabbing frame.
In the technical scheme, the position and the grabbing angle of a grabbing frame are extracted on the basis of a first-order grabbing detection model RetinaNet model, the weight of grabbing key feature channels in an image to be detected is increased by embedding a SEnet module, the interdependence relation between the feature channels is established, the features which play a positive role in grabbing a detection task are promoted, and useless features are inhibited, so that the detection accuracy is improved; in the process of training the grabbing detection model, a training data set is downloaded through an interface, and an image containing a target object of the robot grabbing target object is acquired through a visual sensor to construct the training data set, so that the diversity of training samples is ensured; the training data set is preprocessed, so that the capture detection model can be captured conveniently to detect the capture area rapidly. After the grabbing detection model is trained, the robot can acquire a target grabbing image of the robot to be detected in real time through the visual sensor, and the trained grabbing detection model is input to obtain an RGB image which is used for generating a grabbing frame for grabbing a target object and contains the target object, so that real-time detection of a target grabbing area is realized.
Preferably, in the step S2, the specific step of preprocessing the images in the training data set includes:
s21: randomly translating the images in the training data set in n pixel points on an x axis and a y axis respectively, wherein n is a positive integer and is more than or equal to 50;
s22: carrying out random rotation within the range of 0-360 degrees on the image after the translation processing;
s23: performing center cutting on the image subjected to the rotation processing to obtain images with the same size;
s24: adjusting the image resolution of the cut image;
s25: performing data tagging processing on the image subjected to resolution adjustment: the 180 DEG angle and background classification is divided into a plurality of label categories, and then the angle value in the label is correspondingly allocated to the area with the nearest distance.
Preferably, in step S3, the RetinaNet model includes a rescnet 50 extraction feature network, a feature pyramid FPN structure, and 3 FCN subnetworks, where a SENet module is embedded after each residual block in the feature network is extracted by the rescnet 50; the output ends of SENEt modules of layers 3, 4 and 5 in the extraction feature network of ResNet50 are respectively connected with the input end of the feature pyramid FPN structure; the output ends of the feature pyramid FPN structure are connected to the input ends of 3 FCN subnetworks, respectively.
Preferably, the SEnet module performs the following operations on the feature map output by the residual block:
and (3) extrusion operation: performing global average pooling compression, and converting each characteristic diagram C into a real number array of 1 × 1 × C;
and (3) excitation operation: reducing feature dimension to original by convolution layer
Figure BDA0002218911000000031
Secondly, increasing nonlinearity through a Relu activation function, then restoring the original dimensionality of the feature graph after dimensionality reduction through a second convolution layer, and obtaining normalized weight through a Sigmoid function, wherein r represents a compression ratio;
characteristic recalibration operation: and weighting the original characteristic channel by channel through multiplication, and recalibrating the original characteristic.
Preferably, the output end of the SEnet module of the 5 th layer in the ResNet50 extracted feature network is sequentially connected with the convolution layer with convolution kernel of 3 × 3 and step size of 2, the Relu activation function layer, the convolution kernel of 3 × 3 and the convolution layer with step size of 2.
Preferably, the feature pyramid FPN structure adopts a balanced feature pyramid structure, wherein the feature pyramid FPN structure adopts a maximum pooling operation and an upsampling operation on the P3 feature map output by the SENet module of layer 3 and the P5 feature map output by the SENet module of layer 5 to adjust the resolution of the P3 and P5 feature maps to be consistent with the resolution of the P4 feature map output by the SENet module of layer 4, and then adds corresponding elements of the P3, P4 and P5 feature maps and averages the corresponding elements to obtain a balanced feature map P' for output; the expression formula is as follows:
Figure BDA0002218911000000032
where Y is the number of layers added, lmaxDenotes the highest number of layers,/minDenotes the lowest number of layers, PlIndicating the ith layer characteristics.
Preferably, in the step S4, the specific steps include:
s41: downloading a Microsoft COCO data set through an interface to train a ResNet50 extracted feature network in a capture detection model, and obtaining a parameter initial value of a ResNet50 extracted feature network;
s42: adjusting the initial parameter value of the grabbing detection model by adopting standard Gaussian distribution;
s43: and taking the images in the preprocessed training data set as input of a capture detection model in an RGB (red, green and blue) form, and training the model by adopting a random gradient descent method, wherein the learning rate is initialized to 0.0001, the learning rate attenuation factor is 5, the number of each batch of training images is set to be 2, and the epoch is initialized to 20.
Preferably, the step of S4 further includes the steps of: training a grabbing detection model by adopting a Focal local function as a classification loss function, wherein the Focal local function calculation formula is as follows:
Figure BDA0002218911000000033
wherein N is the number of samples, and T is the number of classifications; y isi,tLabel, p, indicating that the ith sample is predicted as a class t objecti,tRepresenting the probability that the ith sample is predicted as the t-th class target; alpha is alphatThe balance parameter is used for adjusting the contribution of the positive and negative samples to the total classification loss; gamma is a hyperparameter.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that: the capturing detection model is constructed by combining the SEnet module, the feature pyramid FPN structure and the FCN sub-network, the capturing area prediction effect can be improved, the mutual dependency relationship among feature channels is established through the SEnet structure, the image feature of capturing detection can be enhanced, and therefore the detection accuracy is improved; the balance pyramid FPN structure is used for fusing different levels of feature information, the capturing capability of the model on detail information can be enhanced, and the capability of detecting small capturing frames is effectively enhanced.
Drawings
Fig. 1 is a flowchart of a robot target grabbing area real-time detection method based on an SE-RetinaGrasp model according to this embodiment.
Fig. 2 is a schematic structural diagram of the SE-RetinaGrasp model in this embodiment.
FIG. 3 is a schematic structural diagram of the SE-RetinaGrasp model of the present embodiment
Fig. 4 is a schematic structural diagram of the SENet module of the present embodiment.
Fig. 5 is a schematic structural diagram of the feature pyramid FPN structure of the present embodiment.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Fig. 1 is a flowchart of a real-time detection method for a target grabbing area of a robot based on an SE-RetinaGrasp model according to this embodiment.
The embodiment provides a robot target grabbing area real-time detection method based on an SE-RetinaGrasp model, which comprises the following steps:
s1: and downloading a training data set through an interface and acquiring an image containing the target object, captured by the robot, of the target object through a visual sensor to construct the training data set.
In this embodiment, the cornell grab data set is downloaded through the interface, and the image of the robot including the target object grabbed by the robot is acquired through the vision sensor. The Connell grasping data set comprises a plurality of objects, the objects are rich in types, the number of the objects is small, and the types of training samples can be enriched by combining the Connell grasping data set with images acquired by a visual sensor.
S2: preprocessing the images in the training dataset.
In this step, the specific steps of preprocessing the images in the training data set include:
s21: respectively carrying out random translation on the images in the training data set in 50 pixel points on an x axis and a y axis;
s22: carrying out random rotation within the range of 0-360 degrees on the image after the translation processing;
s23: performing center clipping on the image subjected to the rotation processing to obtain an image with the size of 321 x 321;
s24: adjusting the image resolution of the cropped image to 227 multiplied by 227 resolution;
s25: performing data tagging processing on the image subjected to resolution adjustment: the 180 ° angle and background classification is divided into 20 label categories, and then the angle values in the labels are correspondingly assigned to the nearest regions.
In the embodiment, the images in the training data set are subjected to random translation and random rotation, so that the images in the training data set cover various possible conditions as much as possible, and the robot can grasp objects in various and arbitrary postures in an accurate and stable manner; the image is cut and the resolution ratio is adjusted, so that the robot can rapidly detect the captured area; in the embodiment, the angle of 180 degrees and the background are classified into 19 regions in consideration of the symmetry of the angle, the background classification is added, and 20 types of classes are provided, wherein the angle value in the label is correspondingly distributed to the nearest region, and the original rectangular frame with the directivity is set as a rectangular frame without angle inclination, so that the rectangular frame perpendicular to the x axis of the image is fitted and the angle class of the rectangular frame is predicted when the capture detection model is trained subsequently.
S3: and constructing a grabbing detection model by adopting a RetinaNet model and a SEnet module to obtain an SE-RetinaGrasp model.
In this embodiment, the RetinaNet model includes a resenet 50 extracted feature network, a feature pyramid FPN structure, and 3 FCN subnetworks, wherein a SENet module is embedded after each residual block in the feature network is extracted by a resenet 50; the output ends of SENEt modules of layers 3, 4 and 5 in the extraction feature network of ResNet50 are respectively connected with the input end of the feature pyramid FPN structure; the output ends of the feature pyramid FPN structure are connected to the input ends of 3 FCN subnetworks, respectively.
Fig. 2 and 3 are schematic structural diagrams of the SE-RetinaGrasp model of the present embodiment. In the embodiment, the ResNet50 extracts feature maps only using the C3, C4 and C5 feature maps, so that the generation of anchors in the high-resolution C2 feature map can be avoided, and the model detection time can be reduced.
The output end of a SEnet module of the 5 th layer in the extracted feature network of the ResNet50 is sequentially connected with a convolution layer with convolution kernel of 3 x 3 and step length of 2, a Relu activation function layer and a convolution kernel of 3 x 3 and step length of 2. The C5 characteristic diagram is output from the output end of the SENET module at the 5 th layer, a P6 network structure is obtained through convolution operation of a convolution layer with convolution kernel of 3 multiplied by 3 and step length of 2, the P6 network structure is subjected to Relu activation function adding branching processing and then the same convolution operation is carried out to obtain a P7 network structure, and candidate areas with larger areas are generated in the P6 and P7 network structures, so that the performance of model detection of large objects can be enhanced.
In this embodiment, the capture detection is to detect a position available for capture in the object, which is different from the position of the object in the image detected in the object detection. Aiming at the characteristic that the image in the kanel captured data set only has a single target object, the RetinaNet model is better applied toIn the problem of grab detection, in the embodiment, grab candidate regions are generated only in three levels of P3, P4 and P5, and {8 } is adopted2,162,322Base size candidate window, join
Figure BDA0002218911000000062
And searching grabbing candidate frames with various sizes according to three different scales and three different length-width ratios of {1:2,1:1 and 2:1 }.
Fig. 4 is a schematic structural diagram of the SENet module of this embodiment. In this embodiment, the SEnet module performs the following operations on the feature map output by the residual block:
and (3) extrusion operation: performing global average pooling compression, and converting each characteristic diagram C into a real number array of 1 × 1 × C;
and (3) excitation operation: reducing feature dimension to original by convolution layer
Figure BDA0002218911000000061
Secondly, increasing nonlinearity through a Relu activation function, then restoring the original dimensionality of the feature graph after dimensionality reduction through a second convolution layer, and obtaining normalized weight through a Sigmoid function, wherein r represents a compression ratio;
characteristic recalibration operation: and weighting the original characteristic channel by channel through multiplication, and recalibrating the original characteristic.
Fig. 5 is a schematic structural diagram of the feature pyramid FPN structure of the present embodiment. In this embodiment, the feature pyramid FPN structure adopts a balanced feature pyramid structure, wherein the feature pyramid FPN structure performs maximum pooling operation and upsampling operation on the P3 feature map output by the SENet module of the 3 rd layer and the P5 feature map output by the SENet module of the 5 th layer to adjust the resolution of the P3 and the P5 feature maps to be consistent with the resolution of the P4 feature map output by the SENet module of the 4 th layer, and then adds corresponding elements of the P3, the P4 and the P5 feature maps and averages the added elements to obtain a balanced feature map P' for output; the expression formula is as follows:
where Y is the number of layers added, lmaxDenotes the highest number of layers,/minDenotes the lowest number of layers, PlIndicating the ith layer characteristics.
Inputting the balanced characteristic diagram P' into a two-dimensional convolutional layer, performing maximum pooling operation and up-sampling operation, adjusting the resolution, adding corresponding elements of the P3, P4 and P5 characteristic diagrams, and averaging to obtain balanced characteristic diagrams P corresponding to the P3, P4 and P5 characteristic diagrams respectively3′、P4′、P5′。
S4: inputting the preprocessed training data set into the grabbing detection model, and training the grabbing detection model by adopting a transfer learning method and a random gradient descent method.
In this step, the specific steps include:
s41: downloading a Microsoft COCO data set through an interface to train a ResNet50 extracted feature network in a capture detection model, and obtaining a parameter initial value of a ResNet50 extracted feature network;
s42: adjusting the initial parameter value of the grabbing detection model by adopting standard Gaussian distribution;
s43: taking the images in the preprocessed training data set as input of a capture detection model in an RGB (red, green and blue) form, and training the model by adopting a random gradient descent method, wherein the learning rate is initialized to 0.0001, the learning rate attenuation factor is 5, the number of each batch of training images is set to be 2, and the epoch is initialized to be 20;
s44: training a grabbing detection model by adopting a Focal local function as a classification loss function, wherein the Focal local function calculation formula is as follows:
Figure BDA0002218911000000072
wherein N is the number of samples, and T is the number of classifications; y isi,tLabel, p, indicating that the ith sample is predicted as a class t objecti,tRepresenting the probability that the ith sample is predicted as the t-th class target; alpha is alphatThe balance parameter is used for adjusting the contribution of the positive and negative samples to the total classification loss; gamma is a hyperparameter.
S5: and acquiring a robot target grabbing image to be detected in real time through a visual sensor, and inputting the grabbing detection model to obtain a target grabbing area detection image with a grabbing frame.
In the embodiment, a RetinaNet model is taken as a basis, a SENet module, a feature pyramid FPN structure and an FCN sub-network are combined to construct a capture detection model, wherein mutual dependency among feature channels is established through the SENet structure, so that features which play a positive role in capturing detection images are improved, useless features are inhibited, and the detection accuracy is improved; by utilizing the FPN structure of the balance pyramid, on the premise of not increasing too many parameters, feature information of different levels is further fused, the capturing capability of the model on detail information is enhanced, and the capability of detecting small capturing frames is enhanced. The robot target grabbing area real-time detection method based on the SE-RetinaGrasp model provided by the embodiment realizes high detection accuracy, runs at a real-time detection speed and improves the fineness of a grabbing frame to a certain extent.
The same or similar reference numerals correspond to the same or similar parts;
the terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (8)

1. The robot target grabbing area real-time detection method based on the SE-RetinaGrasp model is characterized by comprising the following steps of:
s1: downloading a training data set through an interface and acquiring an image containing a target object of a robot grabbing target object through a visual sensor to construct the training data set;
s2: preprocessing images in the training data set;
s3: constructing a grabbing detection model by adopting a RetinaNet model and a SENet module;
s4: inputting the preprocessed training data set into the grabbing detection model, and training the grabbing detection model by adopting a transfer learning method and a random gradient descent method;
s5: and acquiring a robot target grabbing image to be detected in real time through a visual sensor, and inputting the grabbing detection model to obtain a target grabbing area detection image with a grabbing frame.
2. The real-time detection method of the robot target gripping area according to claim 1, characterized in that: in the step S2, the specific step of preprocessing the images in the training data set includes:
s21: randomly translating the images in the training data set in n pixel points on an x axis and a y axis respectively, wherein n is a positive integer and is more than or equal to 50;
s22: carrying out random rotation within the range of 0-360 degrees on the image after the translation processing;
s23: performing center cutting on the image subjected to the rotation processing to obtain images with the same size;
s24: adjusting the image resolution of the cut image;
s25: performing data tagging processing on the image subjected to resolution adjustment: the 180 DEG angle and background classification is divided into a plurality of label categories, and then the angle value in the label is correspondingly allocated to the area with the nearest distance.
3. The real-time detection method of the robot target gripping area according to claim 2, characterized in that: in the step S3, the RetinaNet model includes a ResNet50 extraction feature network, a feature pyramid FPN structure, and 3 FCN subnetworks, where a SENet module is embedded after each residual block in the feature network is extracted by the ResNet 50; the output ends of SENEt modules of layers 3, 4 and 5 in the ResNet50 extraction feature network are respectively connected with the input end of the feature pyramid FPN structure; and the output ends of the characteristic pyramid FPN structure are respectively connected with the input ends of the 3 FCN sub-networks.
4. The real-time detection method of the robot target gripping area according to claim 3, characterized in that: the SEnet module performs the following operations on the feature map output by the residual block:
and (3) extrusion operation: performing global average pooling compression, and converting each characteristic diagram C into a real number array of 1 × 1 × C;
and (3) excitation operation: reducing feature dimension to original by convolution layerThen, increasing nonlinearity through a Relu activation function, then restoring the original dimensionality of the feature graph after dimensionality reduction through a second convolution layer, and obtaining normalized weight through a Sigmoid function, wherein r is a compression ratio;
characteristic recalibration operation: and weighting the original characteristic channel by channel through multiplication, and recalibrating the original characteristic.
5. The real-time detection method of the robot target gripping area according to claim 3, characterized in that: and the output end of the SEnet module of the 5 th layer in the ResNet50 extracted feature network is sequentially connected with a convolution layer with convolution kernel of 3 multiplied by 3 and step length of 2, a Relu activation function layer, a convolution kernel of 3 multiplied by 3 and a convolution layer with step length of 2.
6. The real-time detection method of the robot target gripping area according to claim 3, characterized in that: the feature pyramid FPN structure adopts a balanced feature pyramid structure, wherein the feature pyramid FPN structure adjusts the resolution of the P3 and P5 feature maps to be consistent with the resolution of the P4 feature map output by the SEnet module at the 4 th layer by adopting maximum pooling operation and up-sampling operation on the P3 feature map output by the SEnet module at the 3 rd layer and the P5 feature map output by the SEnet module at the 5 th layer, and then corresponding elements of the P3, P4 and P5 feature maps are added and averaged to obtain a balanced feature map P' for output; the expression formula is as follows:
Figure FDA0002218910990000022
where Y is the number of layers added, lmaxDenotes the highest number of layers,/minDenotes the lowest number of layers, PlIndicating the ith layer characteristics.
7. The real-time detection method of the robot target gripping area according to claim 3, characterized in that: in the step S4, the specific steps include:
s41: downloading a Microsoft COCO data set through an interface to train a ResNet50 extracted feature network in a capture detection model, and obtaining a parameter initial value of a ResNet50 extracted feature network;
s42: adjusting the initial parameter value of the grabbing detection model by adopting standard Gaussian distribution;
s43: and taking the images in the preprocessed training data set as input of a capture detection model in an RGB (red, green and blue) form, and training the model by adopting a random gradient descent method, wherein the learning rate is initialized to 0.0001, the learning rate attenuation factor is 5, the number of each batch of training images is set to be 2, and the epoch is initialized to 20.
8. The real-time detection method of the robot target gripping area according to claim 7, characterized in that: in the step S4, the method further includes the steps of: training a grabbing detection model by adopting a Focal local function as a classification loss function, wherein the Focal local function calculation formula is as follows:
wherein N is the number of samples, and T is the number of classifications; y isi,tLabel, p, indicating that the ith sample is predicted as a class t objecti,tRepresenting the probability that the ith sample is predicted as the t-th class target; alpha is alphatThe balance parameter is used for adjusting the contribution of the positive and negative samples to the total classification loss; gamma is a hyperparameter.
CN201910925919.4A 2019-09-27 2019-09-27 Real-time detection method for robot target grabbing area based on SE-RetinaGrasp model Pending CN110717532A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910925919.4A CN110717532A (en) 2019-09-27 2019-09-27 Real-time detection method for robot target grabbing area based on SE-RetinaGrasp model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910925919.4A CN110717532A (en) 2019-09-27 2019-09-27 Real-time detection method for robot target grabbing area based on SE-RetinaGrasp model

Publications (1)

Publication Number Publication Date
CN110717532A true CN110717532A (en) 2020-01-21

Family

ID=69211051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910925919.4A Pending CN110717532A (en) 2019-09-27 2019-09-27 Real-time detection method for robot target grabbing area based on SE-RetinaGrasp model

Country Status (1)

Country Link
CN (1) CN110717532A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507199A (en) * 2020-03-25 2020-08-07 杭州电子科技大学 Method and device for detecting mask wearing behavior
CN111523478A (en) * 2020-04-24 2020-08-11 中山大学 Pedestrian image detection method acting on target detection system
CN111553321A (en) * 2020-05-18 2020-08-18 城云科技(中国)有限公司 Mobile vendor target detection model, detection method and management method thereof
CN111626379A (en) * 2020-07-07 2020-09-04 中国计量大学 X-ray image detection method for pneumonia
CN111783772A (en) * 2020-06-12 2020-10-16 青岛理工大学 Grabbing detection method based on RP-ResNet network
CN112068422A (en) * 2020-08-04 2020-12-11 广州中国科学院先进技术研究所 Grabbing learning method and device of intelligent robot based on small samples
CN112330664A (en) * 2020-11-25 2021-02-05 腾讯科技(深圳)有限公司 Pavement disease detection method and device, electronic equipment and storage medium
CN112633218A (en) * 2020-12-30 2021-04-09 深圳市优必选科技股份有限公司 Face detection method and device, terminal equipment and computer readable storage medium
CN112686297A (en) * 2020-12-29 2021-04-20 中国人民解放军海军航空大学 Radar target motion state classification method and system
CN113076972A (en) * 2021-03-04 2021-07-06 山东师范大学 Two-stage Logo image detection method and system based on deep learning
CN113762159A (en) * 2021-09-08 2021-12-07 山东大学 Target grabbing detection method and system based on directional arrow model
CN115998295A (en) * 2023-03-24 2023-04-25 广东工业大学 Blood fat estimation method, system and device combining far-near infrared light

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卢智亮 等: "机器人目标抓取区域实时检测方法" *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507199A (en) * 2020-03-25 2020-08-07 杭州电子科技大学 Method and device for detecting mask wearing behavior
CN111523478A (en) * 2020-04-24 2020-08-11 中山大学 Pedestrian image detection method acting on target detection system
CN111523478B (en) * 2020-04-24 2023-04-28 中山大学 Pedestrian image detection method acting on target detection system
CN111553321A (en) * 2020-05-18 2020-08-18 城云科技(中国)有限公司 Mobile vendor target detection model, detection method and management method thereof
CN111783772A (en) * 2020-06-12 2020-10-16 青岛理工大学 Grabbing detection method based on RP-ResNet network
CN111626379A (en) * 2020-07-07 2020-09-04 中国计量大学 X-ray image detection method for pneumonia
CN111626379B (en) * 2020-07-07 2024-01-05 中国计量大学 X-ray image detection method for pneumonia
CN112068422A (en) * 2020-08-04 2020-12-11 广州中国科学院先进技术研究所 Grabbing learning method and device of intelligent robot based on small samples
CN112330664B (en) * 2020-11-25 2022-02-08 腾讯科技(深圳)有限公司 Pavement disease detection method and device, electronic equipment and storage medium
CN112330664A (en) * 2020-11-25 2021-02-05 腾讯科技(深圳)有限公司 Pavement disease detection method and device, electronic equipment and storage medium
CN112686297A (en) * 2020-12-29 2021-04-20 中国人民解放军海军航空大学 Radar target motion state classification method and system
CN112633218B (en) * 2020-12-30 2023-10-13 深圳市优必选科技股份有限公司 Face detection method, face detection device, terminal equipment and computer readable storage medium
CN112633218A (en) * 2020-12-30 2021-04-09 深圳市优必选科技股份有限公司 Face detection method and device, terminal equipment and computer readable storage medium
CN113076972A (en) * 2021-03-04 2021-07-06 山东师范大学 Two-stage Logo image detection method and system based on deep learning
CN113762159A (en) * 2021-09-08 2021-12-07 山东大学 Target grabbing detection method and system based on directional arrow model
CN113762159B (en) * 2021-09-08 2023-08-08 山东大学 Target grabbing detection method and system based on directional arrow model
CN115998295A (en) * 2023-03-24 2023-04-25 广东工业大学 Blood fat estimation method, system and device combining far-near infrared light

Similar Documents

Publication Publication Date Title
CN110717532A (en) Real-time detection method for robot target grabbing area based on SE-RetinaGrasp model
CN108665496A (en) A kind of semanteme end to end based on deep learning is instant to be positioned and builds drawing method
CN109377530A (en) A kind of binocular depth estimation method based on deep neural network
CN111144329A (en) Light-weight rapid crowd counting method based on multiple labels
CN109034184B (en) Grading ring detection and identification method based on deep learning
WO2022095253A1 (en) Method for removing cloud and haze on basis of depth channel sensing
CN112818969A (en) Knowledge distillation-based face pose estimation method and system
CN114092793B (en) End-to-end biological target detection method suitable for complex underwater environment
CN104517126A (en) Air quality assessment method based on image analysis
CN113449691A (en) Human shape recognition system and method based on non-local attention mechanism
CN109523558A (en) A kind of portrait dividing method and system
CN110399908A (en) Classification method and device based on event mode camera, storage medium, electronic device
CN110647909A (en) Remote sensing image classification method based on three-dimensional dense convolution neural network
CN112396635A (en) Multi-target detection method based on multiple devices in complex environment
CN116416244A (en) Crack detection method and system based on deep learning
CN116229226A (en) Dual-channel image fusion target detection method suitable for photoelectric pod
CN116258940A (en) Small target detection method for multi-scale features and self-adaptive weights
CN115482529A (en) Method, equipment, storage medium and device for recognizing fruit image in near scene
CN114359578A (en) Application method and system of pest and disease damage identification intelligent terminal
CN112668675B (en) Image processing method and device, computer equipment and storage medium
CN113901928A (en) Target detection method based on dynamic super-resolution, and power transmission line component detection method and system
CN110956115B (en) Scene recognition method and device
CN116563844A (en) Cherry tomato maturity detection method, device, equipment and storage medium
CN111401453A (en) Mosaic image classification and identification method and system
CN116129234A (en) Attention-based 4D millimeter wave radar and vision fusion method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200121