CN112837330A - Leaf segmentation method based on multi-scale double attention mechanism and full convolution neural network - Google Patents
Leaf segmentation method based on multi-scale double attention mechanism and full convolution neural network Download PDFInfo
- Publication number
- CN112837330A CN112837330A CN202110230518.4A CN202110230518A CN112837330A CN 112837330 A CN112837330 A CN 112837330A CN 202110230518 A CN202110230518 A CN 202110230518A CN 112837330 A CN112837330 A CN 112837330A
- Authority
- CN
- China
- Prior art keywords
- network
- feature
- segmentation
- map
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 73
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 9
- 230000007246 mechanism Effects 0.000 title claims abstract description 9
- 238000000034 method Methods 0.000 title description 17
- 238000001514 detection method Methods 0.000 claims abstract description 50
- 230000004927 fusion Effects 0.000 claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000012360 testing method Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 238000011176 pooling Methods 0.000 description 10
- 230000004913 activation Effects 0.000 description 8
- 238000001994 activation Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 5
- 241000196324 Embryophyta Species 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000009977 dual effect Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 241000219194 Arabidopsis Species 0.000 description 1
- 101150064138 MAP1 gene Proteins 0.000 description 1
- 241000208125 Nicotiana Species 0.000 description 1
- 235000002637 Nicotiana tabacum Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- OLBCVFGFOZPWHH-UHFFFAOYSA-N propofol Chemical compound CC(C)C1=CC=CC(C(C)C)=C1O OLBCVFGFOZPWHH-UHFFFAOYSA-N 0.000 description 1
- 229960004134 propofol Drugs 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000005549 size reduction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/02—Affine transformations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a segmentation system based on a multi-scale double-attention mechanism and a full convolution neural network, which comprises a feature extraction backbone network, a feature pyramid network, a semantic segmentation network, an object detector, a coefficient predictor and a fusion module, wherein the semantic segmentation network comprises a first convolution layer, an attention module and a second convolution layer, wherein: the feature extraction backbone network is a VoVNet57 network and is used for extracting features of the training set images and the test set images and sending the features to the feature pyramid network; the characteristic pyramid network is used for carrying out sibling characteristic map fusion to obtain a P3-P7 characteristic map; inputting a P3-P7 feature map obtained through a feature pyramid fusion network into an FCOS target detector, generating a suggested box type and a position thereof pixel by the target detector, and performing Soft NMS operation on the suggested box to obtain a final detection box; the coefficient predictor carries out weight prediction on the example information of the detection frame to generate an example proportion corresponding to the detection frame; the semantic segmentation network is used for processing the P3-P6 feature maps obtained by the feature pyramid fusion network to generate 4 segmentation maps; and the fusion module is used for overlaying the 4 segmentation maps and the detection frame and outputting a final segmentation map according to the proportion of the corresponding example.
Description
Technical Field
The invention relates to an image processing method, in particular to a leaf segmentation method based on a multi-scale double-attention machine system and a full convolution neural network.
Background
Plant phenotype plays an important role in genetics, botany and agronomy, the proportion of leaves in most plant organs is the largest, the leaves play an important role in vegetation growth and development, the estimation of leaf morphological structure and physiological parameters has important significance in vegetation growth monitoring, the observation of the leaves is helpful for revealing the growth state of the leaves, and finally helps us to distinguish the genetic contribution capacity, improve the genetic characteristics of plants and increase the crop yield. In high throughput phenotypic analysis, automated segmentation of plant leaves is a prerequisite for measuring more complex phenotypic traits. Although leaves have distinctive appearance and shape characteristics, occlusion and variation of leaf shape and pose and imaging conditions make this problem challenging.
Since the 80 s in the 20 th century, a plurality of effective methods are proposed to solve the problem of blade segmentation, but the existing blade segmentation method cannot adapt to a complex background, the post-processing method is complicated, but still has a great improvement space in the aspect of precision, and has a certain gap from the real practical application.
The invention aims to overcome the defects of the prior art and provide a leaf segmentation method based on a multi-scale double-attention machine mechanism and a full convolution neural network.
Disclosure of Invention
In order to realize the purpose of the invention, the following technical scheme is adopted for realizing the purpose:
a segmentation system based on a multi-scale double-attention mechanism and a full convolution neural network comprises a feature extraction backbone network, a feature pyramid network, a semantic segmentation network, a target detector, a coefficient predictor and a fusion module, wherein: the feature extraction backbone network is used for extracting features of the training set images and the test set images and sending the features to the feature pyramid network; the characteristic pyramid network is used for carrying out sibling characteristic map fusion to obtain a P3-P7 characteristic map; inputting a P3-P7 feature map obtained through the feature pyramid fusion network into a target detector, and generating a suggested frame category and a position thereof pixel by the target detector to obtain a final detection frame; the coefficient predictor carries out weight prediction on the example information of the detection frame to generate an example proportion corresponding to the detection frame; the semantic segmentation network is used for processing the P3-P6 feature maps obtained by the feature pyramid fusion network to generate 4 segmentation maps; and the fusion module is used for superposing the 4 segmentation maps and the detection frame and multiplying the superposition with the corresponding example proportion so as to output the final segmentation map.
The segmentation system is characterized in that: the feature extraction backbone network is a VoVNet57 network, the VoVNet57 network comprises 3 convolutional layers and 4 OSA modules with a front-back sequence, each OSA module consists of 5 convolutional layers and has the same input/output channel, the input of the VoVNet57 network is an RGB original picture, a set of 128-channel feature maps are output through the 3 convolutional layers, the set of feature maps are input to the first OSA module, the output of the set of feature maps is used as the input of the next OSA module and sequentially operates, the output of each OSA module is reserved, the feature maps output through the VoVNet57 have four layers, the size of each feature map is 1/4, 1/8, 1/16 and 1/32 of the original image, and the number of channels is 256, 512, 768 and 1024.
The segmentation system is characterized in that: the 1/32 feature map output by the last OSA module of the VoVNet57 network is input to a feature pyramid network, the feature pyramid network performs convolution and downsampling operations on the feature map to finally generate a feature map with the size of the original image 1/128, the feature pyramid network performs progressive upsampling on the feature map to respectively generate feature maps with the sizes of the original images 1/64, 1/32, 1/16 and 1/8, the feature maps with the sizes of the original images 1/32, 1/16 and 1/8 and feature maps with corresponding sizes generated by a feature extraction backbone network are fused, and the fused feature map and the feature maps of 1/128 and 1/64 are used as P3-P7 feature maps finally generated by the feature pyramid network.
The segmentation system is characterized in that: the target detector is an FCOS target detector, and the target detector obtains the category and the position coordinate value of the suggestion box of each feature map pixel by pixel through classification and regression calculation of the P3-P7 feature maps obtained through the feature pyramid fusion network.
The segmentation system is characterized in that: note Fi∈RH*W*CIs the feature at the ith layer of the feature pyramid fusion network, wherein H, W, C represents the height, width and channel number of the feature map respectively, and the real frame provided by the training set is defined asThe 4 coordinates respectively represent the abscissa and ordinate of the upper left corner and the abscissa and ordinate of the lower right corner of the real frame, and c represents the category of the real frame; for any position (x, y) on the feature map by the formula:mapping it back to the input image, s being the total step size before this layer, the FCOS head detection network regressing the detection frame directly for each position; if position (x, y) is inside the real box, then it is considered a positive sample, otherwise it is considered a negative sample; FCOS defines a 4-dimensional vector t*=(l*,t*,r*,b*) As a regression target, these four quantities represent the distances of this position to the four sides of the frame, respectively; if one position is in a plurality of suggestion boxes, selecting the suggestion box with the smallest area as a target suggestion box; the last layer of the network of the target detector predicts the 1-dimensional classification label and 4-dimensional frame information of the target suggestion frame, and for classification and regression, 4 convolution layers are added behind the features respectively, and because the regression result is positive, exp (x) is used for de-mapping the true value in the regression branch; the loss function is as follows:
wherein L isclsIs the adaptive loss, LregIs the loss of cross-over ratio, NposIs the number of positive samples, λ is 1; FCOS detects objects of different sizes in different layer features, and uses 5 feature layers { P }3,P4,P5,P6,P7Step s of 8, 16, 32, 64, 128, respectively; FCOS adds a single branch to predict the centrality of a location by:BCE loss is adopted during training, and the centrality needs to be multiplied to the score of the classification during the reasoning phase, so that the prediction generated by the position far away from the target center can be restrained.
The segmentation system, wherein the target detector performs Soft NMS operation on the proposed box to obtain a final detection box, comprises:
a. firstly, calculating the intersection and parallel ratio among different suggestion boxes;
b. for a suggestion box with an intersection ratio greater than the threshold, the confidence score is not set to 0 directly, but the score of the box is reduced, as shown in the following formula:
wherein, CiRepresenting the suggestion box score of each blade, M is the suggestion box with the highest current score,di is one of the rest suggestion boxes, Ht is a set confidence threshold, iou is an intersection ratio, and the suggestion box with the score larger than a preset value is used as a detection box and output.
The segmentation system wherein the suggestion box performs Soft NMS operations further comprising:
c. if the current score is the highest suggestion box M and the rest suggestion boxes diWhen the intersection ratio of (a) is greater than or equal to the set threshold Ht, a gaussian weighting function is used to replace the lower half of the formula in step b, as shown in the following formula:where σ is the width parameter of the function and D is the domain.
The segmentation system of, wherein: the coefficient predictor is used for carrying out weight prediction on the example information of the detection frame output by the target detector so as to generate an example proportion corresponding to the detection frame.
The segmentation system of, wherein: the semantic segmentation network comprises a first convolution layer, an attention module and a second convolution layer, wherein the first convolution layer carries out feature extraction on a P3-P6 feature map obtained through the feature pyramid fusion network, the attention module further promotes network expression of features extracted by the first convolution layer and outputs the network expression to the second convolution layer, and the second convolution layer generates 4 global segmentation maps after upsampling the output of the attention module.
The segmentation system of, wherein: feature map of a feature pyramid networkInputting the first convolution layer to expand the channel and fully extract the features, thereby generating a new feature mapWherein R is a real number domain, C, H and W respectively represent the channel number, length and width of the characteristic diagram, and then a multi-scale double attention module is applied to sequentially infer the space attention diagramAnd channel attention mapWhere R is the real number domain, C, H and W represent the number of channels, length and width of the feature map, respectively, and the whole process can be summarized as follows:
wherein M isoutIs the final output of the multi-scale dual attention module,representing the product of the matrix elements.
The segmentation system of, wherein: the multi-scale dual attention module includes a spatial attention module and a channel attention module, the spatial attention module generates two feature descriptors along a channel using global average pooling and maximum pooling:andwhere R is the real number domain, C, H and W represent the number of channels, length and width of the feature map, respectively, and the spatial attention module concatenates the descriptors and applies the convolutional layer to generate a global spatial attention mapWhere R is the real number domain, C, H and W represent the number of channels, length and width of the feature map, respectively, and the first convolution kernel of the convolution layer of the spatial attention module is sized toWhere R is the real number field, C, H and W represent the number of channels, length and width of the feature map, respectively, where R is the rate of reduction of the channel size, and thus, the global spatial context attention map MS_GThrough the lower partCalculating the formula:
whereinRepresenting the splicing operation, B is batch normalization, δ is the activation function ReLU. Wherein W0And PW17 × 7 layers and point convolution layers respectively, the convolution kernel size of which is respectivelyAnd C × H × W;
the spatial attention module performs a convolution operation on each spatial location using point convolution as a local context extractor, the local contextCalculated by the following formula:
MS_L(Min)=B(PW1(δ(B(PW0(Min)))))
finally, the feature matrix is combined and output by using broadcast addition, and the feature graph output by the spatial attention moduleComprises the following steps:
the channel attention module puts the feature map output by the spatial attention module into a global space module of the channel attention module to generate two groups of channel descriptors:andboth descriptors are forwarded to a shared multi-layered convolutional subnet to generate a global channel attention mapThe shared subnet is composed of two point convolutional layers,
the channel attention module is inserted into the local branch in parallel and keeps the same architecture as the local space attention, and finally, the feature map of the multi-scale double attention module is summarized and output by using broadcast addition,the specific calculation is as follows:
the segmentation system of, wherein: the fusion module superposes the 4 global segmentation graphs with the detection frames, multiplies the instance proportion corresponding to the detection frames by the instance proportion, and then outputs a final segmentation graph, which comprises the following steps:
a. cutting the 4 global segmentation maps by using all the detection frames to obtain the regions of the segmentation maps corresponding to all the detection frames;
b. performing interpolation operation on the cut region to adjust the size of the region to be consistent with the example proportion matrix;
c. multiplying the adjusted region by the proportion of the corresponding example to obtain a segmentation map of each detection frame;
d. and adding and combining the segmentation maps of all the detection frames to generate a final segmentation map.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the overall network architecture;
FIG. 3 is a schematic diagram of a multi-scale dual attention module;
FIG. 4 is a schematic view of a channel attention module;
FIG. 5 is a schematic view of a spatial attention module;
FIG. 6 is a diagram illustrating a split branch operation;
fig. 7 is a graph of the effect of segmentation.
Detailed Description
The following detailed description of the present invention will be made with reference to the accompanying drawings 1-7.
As shown in fig. 2, the present invention provides a segmentation system based on a multi-scale double-attention mechanism and a full convolution neural network, which includes a feature extraction backbone network, a feature pyramid network, a semantic segmentation network, an object detector, a coefficient predictor and a fusion module, wherein the semantic segmentation network includes a first convolution layer, an attention module and a second convolution layer.
As shown in FIG. 1, the leaf segmentation method based on the multi-scale double attention mechanism and the full convolution neural network of the invention comprises the following steps:
(1) a data set provided by a leaf segmentation challenge match (LSC) is obtained and the original available picture is obtained by decompressing the H5 file. The original available pictures include an original RGB map, a label map, a binary map, and a leaf center map.
The primary catalog of the data set includes four folders: a1, A2 and A4 are mainly used for storing delayed images of arabidopsis top views, the number of original RGB pictures is 993, A3 is mainly used for storing delayed images of tobacco top views, and the number of original RGB pictures is 83. The four folders each include both a training set and a test set, wherein the training set includes an original RGB map, a label map, a binary map, and a leaf center map, and the test set includes only the original RGB map and the binary map.
(2) The training set is converted to a COCO _2017 data format for ease of manipulation and processing.
The primary catalog of a dataset in the COCO _2017 format includes two folders: and an angles folder for storing the annotated json file, and a train2017 for storing the original.
(3) And carrying out image enhancement operation on the sample pictures in the training set so as to expand the training samples.
Randomly selecting sample pictures in the training set to perform at least one of the following operations to increase the number of the sample pictures: 1) horizontally turning and vertically turning the image; 2) carrying out affine transformation on the image, wherein the affine transformation comprises translation, scaling and rotation; 3) the image is light-conditioned so that the image becomes darker. Therefore, the generalization capability of the model is improved, the overfitting degree of the model is reduced, and the detection and segmentation precision is effectively improved.
(4) And inputting the extended training set blade images and the extended test set blade images into a feature extraction network. The feature extraction network comprises a feature extraction backbone network and a feature pyramid network. The feature extraction backbone network is a VoVNet57 network.
(5) Features of the training set images and the test set images were extracted using a pre-trained VoVNet57 network, which VoVNet57 network comprised 3 convolutional layers and 4 OSA modules with a tandem order. The input of VovNet57 is RGB original picture, and a set of 128 channels feature map is outputted through 3-layer convolution, and the set of feature map is inputted to the first OSA module, and its output is used as the input of the next OSA module, and sequentially operates in turn, and the output of each OSA module is retained, so the feature map outputted through VovNet57 has four layers in total, the size is 1/4, 1/8, 1/16 and 1/32 of the original picture, and the number of channels is 256, 512, 768 and 1024 respectively. The core modules of the VoVNet57 network are OSA modules, each consisting of 5 convolutional layers, with identical input/output channels, and whose features are aggregated simultaneously to the last layer, each convolutional layer containing bi-directional connections, one connected to the next layer to produce features with a larger receptive field, and the other aggregated only into the final output feature map. Therefore, the problem of information redundancy caused by intensive connection of a traditional feature extraction backbone network can be effectively solved, the feature extraction effect is enhanced, the feature extraction speed and the GPU computing efficiency are improved, model convergence can be accelerated through the pre-trained VoVNet57 network, and model performance is improved.
(6) Inputting a feature map obtained by a VoVNet57 feature extraction backbone network into a feature pyramid network, wherein the feature pyramid network constructs a feature pyramid by utilizing hierarchical semantic features of a convolutional network. The feature map 1/32 output by the last OSA module of the VoVNet57 network is input to the feature pyramid network, the feature pyramid network performs convolution and downsampling operations on the feature map to finally generate a feature map with the size of the original image 1/128, the feature map is progressively upsampled to generate feature maps with the sizes of the original images 1/64, 1/32, 1/16 and 1/8, the feature maps with the sizes of the original images 1/32, 1/16 and 1/8 are fused with feature maps with corresponding sizes generated by the feature extraction backbone network, and the fused feature maps and the feature maps of 1/128 and 1/64 are finally generated layer feature maps. The processing of the feature pyramid network comprises a fusion process of top-down and lateral connection, and the small feature map of the top layer is amplified to the size same as that of the feature map of the previous stage in an upsampling mode, so that the advantage that the strong semantic features of the top layer (which are beneficial to classification) are utilized, the high-resolution information of the bottom layer (which are beneficial to positioning) is utilized, and the upsampling method is realized through nearest neighbor interpolation. In order to combine the high-level semantic features with the accurate positioning capability of the bottom level, a lateral connection structure similar to a residual network is adopted, and the lateral connection fuses the features of the upper level which are subjected to upsampling and have the same resolution as the current level by an addition method to finally obtain a P3-P7 feature map (namely, the feature maps with the sizes respectively being 1/8/, 1/16, 1/32, 1/64 and 1/128 of the original drawings). Therefore, the feature pyramid network can further improve semantic expression through context information, and increase feature mapping resolution, so as to better retain information of small target objects and output features with stronger expression capability.
(7) Inputting the P3-P7 feature maps (i.e. feature maps with the sizes of original images 1/8/, 1/16, 1/32, 1/64 and 1/128) obtained by the feature pyramid fusion network into an FCOS target detector, and obtaining the class of the recommended frame of each feature map by the target detector through classification and regression calculation pixel by pixelAnd distinguishing the coordinate values of the positions. Note Fi∈RH*W*CIs the feature at the ith level of the feature pyramid fusion network, where R represents the real number domain and H, W, C represents the feature map height, width, and channel number, respectively. Defining the real box provided by the training set asThe 4 coordinates represent the abscissa and ordinate of the upper left corner and the abscissa and ordinate of the lower right corner of the real frame, respectively, and c represents the category of the real frame. For any position (x, y) on the feature map, the following formula can be used:mapping it back to the input image, s is the total step size before this level, and the FCOS target detector directly regresses the proposed box for each position. If the position (x, y) on the feature map is inside the real box, it is considered a positive sample, otherwise it is considered a negative sample. The FCOS target Detector defines a 4-dimensional vector t*=(l*,t*,r*,b*) As a regression target, these four quantities represent the distance of this position to the four edges of the proposed box, respectively. And if one position is in a plurality of suggestion boxes, selecting the suggestion box with the smallest area as the target suggestion box. Predicting the 1-dimensional classification label and the 4-dimensional frame information of the target suggestion frame by using the last layer of the target detector, wherein the loss function of the target suggestion frame is as follows:
wherein L isclsIs the adaptive classification loss, LregIs the loss of cross-over ratio, NposIs the number of positive samples, λ is 1, Σx,yDenotes summing all positions, C* x,yRepresenting which category the suggestion box at that location (x, y) belongs to, f being an indicator function, if C* x,yIf greater than 0, the value is 1, otherwise the value is 0, tx,yAnd t* x,yRespectively representing the positions of the proposed and real boxes, Px,yProbability of belonging to the category, i.e. confidence score of the suggested box at the location, Px,yThe SoftMax function is defined as follows, obtained by the SoftMax function:where e is the exponent, Vi is the weight, Σ, on the ith neuronjMeans that all neurons are summed. FCOS detects objects of different sizes in different layer features, and uses 5 feature layers { P }3,P4,P5,P6,P7With steps s of 8, 16, 32, 64, 128, respectively. FCOS adds a single branch to predict the centrality of a location by:where min is the minimum function, max is the maximum function, l*、r*、t*And b*The distances from the position to the four sides of the frame are respectively expressed, two-classification cross entropy loss is adopted during training, and the centrality needs to be multiplied to the classification score during the inference stage, so that the prediction generated by the position far away from the target center can be restrained.
(8) The target detector performs a Soft NMS operation on the proposed box to obtain a final detection box. In order to ensure the recall rate of object detection, the traditional target detection method usually uses NMS (network management system) for processing, and the main method is to sort the generated suggestion boxes according to confidence scores, reserve the box with the highest score and delete other boxes with the overlapping area of the boxes larger than a certain proportion; however, the NMS processing method, although simple and effective, has a certain problem, the biggest problem is that it forces all the scores of the adjacent suggestion boxes to zero, and if a real object appears in the overlapping area, the detection of the object will fail.
a. Firstly, sorting all suggestion boxes according to the confidence score;
b. selecting a suggestion box with the highest confidence score, and calculating the intersection ratio of the suggestion box and the rest suggestion boxes;
c. the confidence score of the remaining suggestion box is recalculated through the intersection proportion, and the specific method is shown as the following formula:
wherein, CiRepresenting the confidence score of the suggestion frame of each blade, wherein M is the suggestion frame with the highest current confidence score, di is one of the rest suggestion frames, Ht is a set confidence threshold value, iou is an intersection ratio, and the suggestion frame with the score larger than a preset value (for example, 0.6) is used as a detection frame and output;
d. when the intersection ratio is smaller than the threshold value, the confidence score of the rest suggestion boxes is unchanged;
e. when the intersection ratio is greater than the threshold, the confidence score of the remaining suggested box is not set to 0 directly, but the score of the box is decreased as follows: if the current suggestion box M and the remaining suggestion box diWhen the intersection ratio of (1) is greater than or equal to the set threshold Ht, the confidence score of the recommended frame linearly decreases. Due to the value C of the formula in step CiNot a continuous function, when a suggestion box diWhen the overlap-to-sum ratio of M exceeds the threshold Ht, the score will jump, and the jump will produce a large fluctuation to the detection result, so a more stable and continuous score resetting function is needed to replace the lower half of the formula in step c, and a gaussian weighting function is used, as shown in the following formula:wherein σ is a width parameter of the function, and D is a domain;
f. sequentially calculating the confidence score of the suggestion frame and the rest suggestion frames;
g. and sequencing the confidence scores of the rest suggestion boxes, and continuously circulating the steps until all the suggestion boxes are completely calculated.
The calculation complexity of Soft NMS is the same as that of NMS, and the score attenuation mode is adopted, so that the recall rate of the model can be effectively improved, and the detection result is improved.
(9) And sending the detection frame to a coefficient predictor, and carrying out weight prediction on the example information of the detection frame by the coefficient predictor so as to generate an example proportion corresponding to the detection frame. The coefficient predictor is a convolutional layer whose output is a 3D structure tensor that can encode instance level information such as the rough shape and pose of an object.
(10) Extracting features of the P3-P6 feature map obtained through the feature pyramid fusion network through a first convolution layer of a semantic segmentation network, further improving the features extracted by the first convolution layer through an attention module, outputting the adjusted feature map to a second convolution layer through the attention module, and upsampling the output of the attention module by the second convolution layer to generate 4 global segmentation maps. In order to better improve the expression capability of the segmentation network and correctly focus on a target object, the invention provides a novel multi-scale double attention module with space and channel descriptors, as shown in fig. 3, a feature map output by a first convolution layer sequentially passes through the space attention module and the channel attention module to generate an attention weight map, and the weight map is subjected to matrix multiplication with an input feature map to generate a final adjusted feature map. This module aggregates the global and local features (as shown in FIG. 2). Feature map of feature pyramid networkAs an input, where R is the real number field, C, H and W represent the number of channels, length and width of the feature map, respectively. First, the feature map is input into the first convolution layer to expand the channel and fully extract the features, thereby generating a new feature mapWherein R is a real number domain, C, H and W respectively represent the channel number, length and width of the characteristic diagram, and then a multi-scale double attention module is applied to sequentially infer the space attention diagram(where R is the real number field and C, H and W represent the number, length and width of channels of the feature map, respectively) and channel attention maps(where R is the real number field and C, H and W represent the number of channels, length and width, respectively, of the feature map). The whole process can be summarized as follows:
wherein M isoutIs the final output of the multi-scale dual attention module, MinIn order to input the characteristic diagram,and expressing the product of matrix elements, wherein Ws is a space attention diagram and Wc is a channel attention diagram. Considering the arrangement sequence of the two modules, an ablation experiment is designed to select the optimal arrangement mode, and through the ablation experiment, the spatial attention module is placed in front of the channel attention module with the highest precision, so that the strategy is selected. .
As shown in fig. 4, the spatial attention module focuses mainly on the dependency of the convolution features on the space and generates a spatial attention matrix to highlight the information-rich areas. In computing spatial attention, the spatial attention module generates two feature descriptors along the path using global average pooling and maximum pooling:andwhere R is the real number field and C, H and W represent the number of channels, length and width of the feature map, respectively. These feature descriptors indicate features of maximum pooling and average pooling. Next, the spatial attention module multiplies the descriptors by the feature map and applies the convolutional layer to generate a global spatial attention mapWherein R is a real number field, C, H and W respectively represent the channel number and length of the characteristic diagramDegree and width. To reduce the parameters and improve the robustness of the training, the first convolution kernel of the convolutional layer is sizedWhere R is the real number field and C, H and W represent the number of channels, length and width, respectively, of the feature map, where R is the rate of channel size reduction. Thus, the global spatial context attention map MS_GCan be calculated by the following formula:
whereinRepresenting the splicing operation, B is batch normalization, MaxPool is maximum pooling, AvgPool is average pooling, MinFor the input profile, δ is the activation function ReLU, W0And PW17 × 7 layers and point convolution layers respectively, the convolution kernel size of which is respectivelyAnd C H W, MLConv is a multilayer convolution.
Parallel local branch modules are added in the spatial attention module to enrich element context and improve multi-scale information expression. In this branch, the inter-attention module performs a convolution operation on each spatial location using a point convolution (convolution kernel size of 1) as a local context extractor. Thus, local contextCan be calculated by the following formula:
MS_L(Min)=B(PW1(δ(B(PW0(Min)))))
wherein B is batch normalization, PW0And PW1Respectively 7 by 7 convolutional layers and point convolutional layers, MinTo input the profile, δ is the activation function ReLU. Finally, the feature matrix is combined and output by using broadcast addition, and the feature graph output by the improved spatial attention moduleComprises the following steps:
whereinRepresenting a matrix addition of the signals of the first and second,representing the product of matrix elements, σ being the activation function Sigmoid, MinIs an input feature map, CGAnd CLGlobal spatial attention and local spatial attention, respectively.
As shown in FIG. 5, unlike the spatial attention module, the channel attention module is able to capture inter-dependencies between channels and learn inter-channel relationships of features with the goal of using more information to assign higher weights to the channels. To efficiently compute the channel attention map, the feature map output by the spatial attention module is put into the global spatial module of the channel attention module, generating two sets of channel descriptors:andboth descriptors are forwarded to a shared multi-layered convolutional subnet to generate a global channel attention mapThe shared sub-network consisting of two point convolutional layers instead of twoThe full connection layer is specifically calculated as follows:
where MLConv is multilayer convolution, MaxPool is maximum pooling, AvgPool is average pooling, Mout’For a characteristic map of the spatial attention module output, PW0And PW1Respectively 7 by 7 convolutional layers and point convolutional layers, MinFor the input profile, B is the batch normalization and δ is the activation function ReLU. Similar to the spatial attention module, we also insert a local branch into the channel attention module in parallel, in this branch, the spatial attention module uses point convolution as a local context extractor to perform convolution operation on each spatial position, finally uses broadcast addition to summarize and output the feature map of the multi-scale double attention module,the specific calculation is as follows:
wherein M isout’For the feature map output by the spatial attention module, σ is the activation function Sigmoid, CG and CL are the global channel attention and the local channel attention respectively,the product of the elements of the matrix is represented,representing a matrix addition.
(11) The fusion module superimposes the 4 global segmentation maps with the detection frames, multiplies the superimposed global segmentation maps by the proportion of the instances corresponding to the detection frames, and outputs the final segmentation maps as shown in fig. 6.
a. Cutting the 4 global segmentation maps by using all the detection frames to obtain the regions of the segmentation maps corresponding to all the detection frames;
b. performing interpolation operation on the cut region to adjust the size of the region to be consistent with the example proportion matrix;
c. multiplying the adjusted region by the proportion of the corresponding example to obtain a segmentation map of each detection frame;
d. and adding and combining the segmentation maps of all the detection frames to generate a final segmentation map.
The convergence module itself has translational variability that enables the network to use different activations to accomplish the function of how to distinguish and locate leaves the whole process of steps a-d above can be calculated by:
wherein, the propofol is a detection frame, the bases is an area of the detection frame corresponding to the segmentation map, the coefficients is a confidence score of the detection frame, I is a linear interpolation operation, and roiign is an operation of fixing the size of the detection frame.
(12) Training the example segmentation model, and storing the trained model; testing the test data set by using the trained model to realize real-time segmentation of the blade image; iteratively optimizing the model parameters based on the defined loss function until the model converges; FIG. 7 is a diagram illustrating the effect of the trained model on segmenting the pictures in the test set.
The invention can obtain the following beneficial effects:
(1) the invention adopts a stage target detection branch, thus improving the detection speed;
(2) the method utilizes a data enhancement technology comprising operations such as turning, affine transformation, illumination adjustment, light and shade contrast transformation and the like to carry out data enhancement on the training sample, enriches image data, expands the scale of a data set, solves the problem of sample shortage, and simultaneously enhances the robustness and generalization capability of the model;
(3) the invention adopts FPN to extract the characteristics, breaks the trouble of parameter setting of the traditional detection method based on manual extraction of characteristics such as edges, contours, textures and the like;
(4) according to the invention, the automatic segmentation of the blades is realized by using a computer vision technology, compared with manual detection, the labor cost is saved, the production efficiency is improved, and the agricultural unmanned management is realized in a real sense;
(5) the invention provides a novel multi-scale double-attention mechanism which can improve the expression capability of a segmentation network in local and global dimensions;
(6) the invention effectively embeds the attention module into the segmentation network and generates the corresponding position-sensitive segmentation graph, which is beneficial to the distinction between the blades.
Claims (3)
1. A segmentation system based on a multi-scale double-attention mechanism and a full convolution neural network comprises a feature extraction backbone network, a feature pyramid network, a semantic segmentation network, a target detector, a coefficient predictor and a fusion module, and is characterized in that: the feature extraction backbone network is used for extracting features of the training set images and the test set images and sending the features to the feature pyramid network; the characteristic pyramid network is used for carrying out sibling characteristic map fusion to obtain a P3-P7 characteristic map; inputting a P3-P7 feature map obtained through the feature pyramid fusion network into a target detector, and generating a suggested frame category and a position thereof pixel by the target detector to obtain a final detection frame; the coefficient predictor carries out weight prediction on the example information of the detection frame to generate an example proportion corresponding to the detection frame; the semantic segmentation network is used for processing the P3-P6 feature maps obtained by the feature pyramid fusion network to generate 4 segmentation maps; and the fusion module is used for superposing the 4 segmentation maps and the detection frame and multiplying the superposition with the corresponding example proportion so as to output the final segmentation map.
2. The segmentation system of claim 1, wherein: the feature extraction backbone network is the VoVNet57 network.
3. The segmentation system of claim 1, wherein: and the coefficient predictor is used for carrying out weight prediction on the example information of the detection frame output by the target detector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110230518.4A CN112837330B (en) | 2021-03-02 | 2021-03-02 | Leaf segmentation method based on multi-scale double-attention mechanism and full convolution neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110230518.4A CN112837330B (en) | 2021-03-02 | 2021-03-02 | Leaf segmentation method based on multi-scale double-attention mechanism and full convolution neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112837330A true CN112837330A (en) | 2021-05-25 |
CN112837330B CN112837330B (en) | 2024-05-10 |
Family
ID=75934347
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110230518.4A Active CN112837330B (en) | 2021-03-02 | 2021-03-02 | Leaf segmentation method based on multi-scale double-attention mechanism and full convolution neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112837330B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269139A (en) * | 2021-06-18 | 2021-08-17 | 中电科大数据研究院有限公司 | Self-learning large-scale police officer image classification model aiming at complex scene |
CN113379770A (en) * | 2021-06-30 | 2021-09-10 | 华南理工大学 | Nasopharyngeal carcinoma MR image segmentation network construction method, image segmentation method and device |
CN113469287A (en) * | 2021-07-27 | 2021-10-01 | 北京信息科技大学 | Spacecraft multi-local component detection method based on instance segmentation network |
CN113486930A (en) * | 2021-06-18 | 2021-10-08 | 陕西大智慧医疗科技股份有限公司 | Small intestinal lymphoma segmentation model establishing and segmenting method and device based on improved RetinaNet |
CN113486879A (en) * | 2021-07-27 | 2021-10-08 | 平安科技(深圳)有限公司 | Image area suggestion frame detection method, device, equipment and storage medium |
CN113538347A (en) * | 2021-06-29 | 2021-10-22 | 中国电子科技集团公司电子科学研究院 | Image detection method and system based on efficient bidirectional path aggregation attention network |
CN113658206A (en) * | 2021-08-13 | 2021-11-16 | 江南大学 | Plant leaf segmentation method |
CN113674142A (en) * | 2021-08-30 | 2021-11-19 | 国家计算机网络与信息安全管理中心 | Method, device, computer equipment and medium for ablating target object in image |
CN113780187A (en) * | 2021-09-13 | 2021-12-10 | 南京邮电大学 | Traffic sign recognition model training method, traffic sign recognition method and device |
CN113887455A (en) * | 2021-10-11 | 2022-01-04 | 东北大学 | Face mask detection system and method based on improved FCOS |
CN114037833A (en) * | 2021-11-18 | 2022-02-11 | 桂林电子科技大学 | Semantic segmentation method for Miao-nationality clothing image |
CN114418999A (en) * | 2022-01-20 | 2022-04-29 | 哈尔滨工业大学 | Retinopathy detection system based on lesion attention pyramid convolution neural network |
CN114511576A (en) * | 2022-04-19 | 2022-05-17 | 山东建筑大学 | Image segmentation method and system for scale self-adaptive feature enhanced deep neural network |
CN114581670A (en) * | 2021-11-25 | 2022-06-03 | 哈尔滨工程大学 | Ship instance segmentation method based on spatial distribution attention |
CN114693930A (en) * | 2022-03-31 | 2022-07-01 | 福州大学 | Example segmentation method and system based on multi-scale features and context attention |
CN114913428A (en) * | 2022-04-26 | 2022-08-16 | 哈尔滨理工大学 | Remote sensing image target detection system based on deep learning |
CN115661694A (en) * | 2022-11-08 | 2023-01-31 | 国网湖北省电力有限公司经济技术研究院 | Intelligent detection method, system, storage medium and electronic equipment for light-weight main transformer focusing on key characteristics |
CN116188479A (en) * | 2023-02-21 | 2023-05-30 | 北京长木谷医疗科技有限公司 | Hip joint image segmentation method and system based on deep learning |
CN117152443A (en) * | 2023-10-30 | 2023-12-01 | 江西云眼视界科技股份有限公司 | Image instance segmentation method and system based on semantic lead guidance |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150254843A1 (en) * | 2012-09-13 | 2015-09-10 | The Regents Of The University Of California | Lung, lobe, and fissure imaging systems and methods |
CN111192277A (en) * | 2019-12-31 | 2020-05-22 | 华为技术有限公司 | Instance partitioning method and device |
CN112381835A (en) * | 2020-10-29 | 2021-02-19 | 中国农业大学 | Crop leaf segmentation method and device based on convolutional neural network |
-
2021
- 2021-03-02 CN CN202110230518.4A patent/CN112837330B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150254843A1 (en) * | 2012-09-13 | 2015-09-10 | The Regents Of The University Of California | Lung, lobe, and fissure imaging systems and methods |
CN111192277A (en) * | 2019-12-31 | 2020-05-22 | 华为技术有限公司 | Instance partitioning method and device |
CN112381835A (en) * | 2020-10-29 | 2021-02-19 | 中国农业大学 | Crop leaf segmentation method and device based on convolutional neural network |
Non-Patent Citations (3)
Title |
---|
ASHISH SINHA ET.AL: "Multi-scale self-guided attention for medical image segmentation", 《ARXIV:1906.02849V3 [CS.CV] 》, pages 3 - 6 * |
HAO CHEN ET.AL: "BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation", 《ARXIV:2001.00309V3 [CS.CV] 》, pages 1 - 9 * |
RUOHAO GUO ET.AL: "LeafMask: Towards Greater Accuracy on Leaf Segmentation", 《ARXIV:2108.03568V1 [CS.CV] 》, pages 1 - 10 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269139A (en) * | 2021-06-18 | 2021-08-17 | 中电科大数据研究院有限公司 | Self-learning large-scale police officer image classification model aiming at complex scene |
CN113269139B (en) * | 2021-06-18 | 2023-09-26 | 中电科大数据研究院有限公司 | Self-learning large-scale police officer image classification model for complex scene |
CN113486930B (en) * | 2021-06-18 | 2024-04-16 | 陕西大智慧医疗科技股份有限公司 | Method and device for establishing and segmenting small intestine lymphoma segmentation model based on improved RetinaNet |
CN113486930A (en) * | 2021-06-18 | 2021-10-08 | 陕西大智慧医疗科技股份有限公司 | Small intestinal lymphoma segmentation model establishing and segmenting method and device based on improved RetinaNet |
CN113538347A (en) * | 2021-06-29 | 2021-10-22 | 中国电子科技集团公司电子科学研究院 | Image detection method and system based on efficient bidirectional path aggregation attention network |
CN113538347B (en) * | 2021-06-29 | 2023-10-27 | 中国电子科技集团公司电子科学研究院 | Image detection method and system based on efficient bidirectional path aggregation attention network |
CN113379770A (en) * | 2021-06-30 | 2021-09-10 | 华南理工大学 | Nasopharyngeal carcinoma MR image segmentation network construction method, image segmentation method and device |
CN113486879B (en) * | 2021-07-27 | 2024-03-05 | 平安科技(深圳)有限公司 | Image area suggestion frame detection method, device, equipment and storage medium |
CN113486879A (en) * | 2021-07-27 | 2021-10-08 | 平安科技(深圳)有限公司 | Image area suggestion frame detection method, device, equipment and storage medium |
CN113469287A (en) * | 2021-07-27 | 2021-10-01 | 北京信息科技大学 | Spacecraft multi-local component detection method based on instance segmentation network |
CN113658206A (en) * | 2021-08-13 | 2021-11-16 | 江南大学 | Plant leaf segmentation method |
CN113658206B (en) * | 2021-08-13 | 2024-04-09 | 江南大学 | Plant leaf segmentation method |
CN113674142A (en) * | 2021-08-30 | 2021-11-19 | 国家计算机网络与信息安全管理中心 | Method, device, computer equipment and medium for ablating target object in image |
CN113674142B (en) * | 2021-08-30 | 2023-10-17 | 国家计算机网络与信息安全管理中心 | Method and device for ablating target object in image, computer equipment and medium |
CN113780187A (en) * | 2021-09-13 | 2021-12-10 | 南京邮电大学 | Traffic sign recognition model training method, traffic sign recognition method and device |
CN113887455A (en) * | 2021-10-11 | 2022-01-04 | 东北大学 | Face mask detection system and method based on improved FCOS |
CN113887455B (en) * | 2021-10-11 | 2024-05-28 | 东北大学 | Face mask detection system and method based on improved FCOS |
CN114037833B (en) * | 2021-11-18 | 2024-03-19 | 桂林电子科技大学 | Semantic segmentation method for image of germchit costume |
CN114037833A (en) * | 2021-11-18 | 2022-02-11 | 桂林电子科技大学 | Semantic segmentation method for Miao-nationality clothing image |
CN114581670A (en) * | 2021-11-25 | 2022-06-03 | 哈尔滨工程大学 | Ship instance segmentation method based on spatial distribution attention |
CN114418999A (en) * | 2022-01-20 | 2022-04-29 | 哈尔滨工业大学 | Retinopathy detection system based on lesion attention pyramid convolution neural network |
CN114693930A (en) * | 2022-03-31 | 2022-07-01 | 福州大学 | Example segmentation method and system based on multi-scale features and context attention |
CN114511576A (en) * | 2022-04-19 | 2022-05-17 | 山东建筑大学 | Image segmentation method and system for scale self-adaptive feature enhanced deep neural network |
CN114913428A (en) * | 2022-04-26 | 2022-08-16 | 哈尔滨理工大学 | Remote sensing image target detection system based on deep learning |
CN115661694A (en) * | 2022-11-08 | 2023-01-31 | 国网湖北省电力有限公司经济技术研究院 | Intelligent detection method, system, storage medium and electronic equipment for light-weight main transformer focusing on key characteristics |
CN115661694B (en) * | 2022-11-08 | 2024-05-28 | 国网湖北省电力有限公司经济技术研究院 | Intelligent detection method and system for light-weight main transformer with focusing key characteristics, storage medium and electronic equipment |
CN116188479A (en) * | 2023-02-21 | 2023-05-30 | 北京长木谷医疗科技有限公司 | Hip joint image segmentation method and system based on deep learning |
CN116188479B (en) * | 2023-02-21 | 2024-04-02 | 北京长木谷医疗科技股份有限公司 | Hip joint image segmentation method and system based on deep learning |
CN117152443B (en) * | 2023-10-30 | 2024-02-23 | 江西云眼视界科技股份有限公司 | Image instance segmentation method and system based on semantic lead guidance |
CN117152443A (en) * | 2023-10-30 | 2023-12-01 | 江西云眼视界科技股份有限公司 | Image instance segmentation method and system based on semantic lead guidance |
Also Published As
Publication number | Publication date |
---|---|
CN112837330B (en) | 2024-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112837330A (en) | Leaf segmentation method based on multi-scale double attention mechanism and full convolution neural network | |
CN110428428B (en) | Image semantic segmentation method, electronic equipment and readable storage medium | |
CN109919108B (en) | Remote sensing image rapid target detection method based on deep hash auxiliary network | |
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN108830326B (en) | Automatic segmentation method and device for MRI (magnetic resonance imaging) image | |
CN108986058B (en) | Image fusion method for brightness consistency learning | |
CN111027493B (en) | Pedestrian detection method based on deep learning multi-network soft fusion | |
CN111797779A (en) | Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion | |
CN113627228B (en) | Lane line detection method based on key point regression and multi-scale feature fusion | |
CN110533022B (en) | Target detection method, system, device and storage medium | |
CN112541508A (en) | Fruit segmentation and recognition method and system and fruit picking robot | |
CN112381764A (en) | Crop disease and insect pest detection method | |
CN112541904A (en) | Unsupervised remote sensing image change detection method, storage medium and computing device | |
CN113706581B (en) | Target tracking method based on residual channel attention and multi-level classification regression | |
CN107506792B (en) | Semi-supervised salient object detection method | |
CN111160407A (en) | Deep learning target detection method and system | |
CN113657560A (en) | Weak supervision image semantic segmentation method and system based on node classification | |
CN112749675A (en) | Potato disease identification method based on convolutional neural network | |
CN115564983A (en) | Target detection method and device, electronic equipment, storage medium and application thereof | |
CN112380917A (en) | A unmanned aerial vehicle for crops plant diseases and insect pests detect | |
Chimakurthi | Application of convolution neural network for digital image processing | |
CN115222998A (en) | Image classification method | |
CN117576079A (en) | Industrial product surface abnormality detection method, device and system | |
CN112418358A (en) | Vehicle multi-attribute classification method for strengthening deep fusion network | |
CN115330759B (en) | Method and device for calculating distance loss based on Hausdorff distance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |