CN112837330A - Leaf segmentation method based on multi-scale double attention mechanism and full convolution neural network - Google Patents

Leaf segmentation method based on multi-scale double attention mechanism and full convolution neural network Download PDF

Info

Publication number
CN112837330A
CN112837330A CN202110230518.4A CN202110230518A CN112837330A CN 112837330 A CN112837330 A CN 112837330A CN 202110230518 A CN202110230518 A CN 202110230518A CN 112837330 A CN112837330 A CN 112837330A
Authority
CN
China
Prior art keywords
network
feature
segmentation
map
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110230518.4A
Other languages
Chinese (zh)
Other versions
CN112837330B (en
Inventor
李振波
郭若皓
李晔
杨泳波
瞿李傲
岳峻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN202110230518.4A priority Critical patent/CN112837330B/en
Publication of CN112837330A publication Critical patent/CN112837330A/en
Application granted granted Critical
Publication of CN112837330B publication Critical patent/CN112837330B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a segmentation system based on a multi-scale double-attention mechanism and a full convolution neural network, which comprises a feature extraction backbone network, a feature pyramid network, a semantic segmentation network, an object detector, a coefficient predictor and a fusion module, wherein the semantic segmentation network comprises a first convolution layer, an attention module and a second convolution layer, wherein: the feature extraction backbone network is a VoVNet57 network and is used for extracting features of the training set images and the test set images and sending the features to the feature pyramid network; the characteristic pyramid network is used for carrying out sibling characteristic map fusion to obtain a P3-P7 characteristic map; inputting a P3-P7 feature map obtained through a feature pyramid fusion network into an FCOS target detector, generating a suggested box type and a position thereof pixel by the target detector, and performing Soft NMS operation on the suggested box to obtain a final detection box; the coefficient predictor carries out weight prediction on the example information of the detection frame to generate an example proportion corresponding to the detection frame; the semantic segmentation network is used for processing the P3-P6 feature maps obtained by the feature pyramid fusion network to generate 4 segmentation maps; and the fusion module is used for overlaying the 4 segmentation maps and the detection frame and outputting a final segmentation map according to the proportion of the corresponding example.

Description

Leaf segmentation method based on multi-scale double attention mechanism and full convolution neural network
Technical Field
The invention relates to an image processing method, in particular to a leaf segmentation method based on a multi-scale double-attention machine system and a full convolution neural network.
Background
Plant phenotype plays an important role in genetics, botany and agronomy, the proportion of leaves in most plant organs is the largest, the leaves play an important role in vegetation growth and development, the estimation of leaf morphological structure and physiological parameters has important significance in vegetation growth monitoring, the observation of the leaves is helpful for revealing the growth state of the leaves, and finally helps us to distinguish the genetic contribution capacity, improve the genetic characteristics of plants and increase the crop yield. In high throughput phenotypic analysis, automated segmentation of plant leaves is a prerequisite for measuring more complex phenotypic traits. Although leaves have distinctive appearance and shape characteristics, occlusion and variation of leaf shape and pose and imaging conditions make this problem challenging.
Since the 80 s in the 20 th century, a plurality of effective methods are proposed to solve the problem of blade segmentation, but the existing blade segmentation method cannot adapt to a complex background, the post-processing method is complicated, but still has a great improvement space in the aspect of precision, and has a certain gap from the real practical application.
The invention aims to overcome the defects of the prior art and provide a leaf segmentation method based on a multi-scale double-attention machine mechanism and a full convolution neural network.
Disclosure of Invention
In order to realize the purpose of the invention, the following technical scheme is adopted for realizing the purpose:
a segmentation system based on a multi-scale double-attention mechanism and a full convolution neural network comprises a feature extraction backbone network, a feature pyramid network, a semantic segmentation network, a target detector, a coefficient predictor and a fusion module, wherein: the feature extraction backbone network is used for extracting features of the training set images and the test set images and sending the features to the feature pyramid network; the characteristic pyramid network is used for carrying out sibling characteristic map fusion to obtain a P3-P7 characteristic map; inputting a P3-P7 feature map obtained through the feature pyramid fusion network into a target detector, and generating a suggested frame category and a position thereof pixel by the target detector to obtain a final detection frame; the coefficient predictor carries out weight prediction on the example information of the detection frame to generate an example proportion corresponding to the detection frame; the semantic segmentation network is used for processing the P3-P6 feature maps obtained by the feature pyramid fusion network to generate 4 segmentation maps; and the fusion module is used for superposing the 4 segmentation maps and the detection frame and multiplying the superposition with the corresponding example proportion so as to output the final segmentation map.
The segmentation system is characterized in that: the feature extraction backbone network is a VoVNet57 network, the VoVNet57 network comprises 3 convolutional layers and 4 OSA modules with a front-back sequence, each OSA module consists of 5 convolutional layers and has the same input/output channel, the input of the VoVNet57 network is an RGB original picture, a set of 128-channel feature maps are output through the 3 convolutional layers, the set of feature maps are input to the first OSA module, the output of the set of feature maps is used as the input of the next OSA module and sequentially operates, the output of each OSA module is reserved, the feature maps output through the VoVNet57 have four layers, the size of each feature map is 1/4, 1/8, 1/16 and 1/32 of the original image, and the number of channels is 256, 512, 768 and 1024.
The segmentation system is characterized in that: the 1/32 feature map output by the last OSA module of the VoVNet57 network is input to a feature pyramid network, the feature pyramid network performs convolution and downsampling operations on the feature map to finally generate a feature map with the size of the original image 1/128, the feature pyramid network performs progressive upsampling on the feature map to respectively generate feature maps with the sizes of the original images 1/64, 1/32, 1/16 and 1/8, the feature maps with the sizes of the original images 1/32, 1/16 and 1/8 and feature maps with corresponding sizes generated by a feature extraction backbone network are fused, and the fused feature map and the feature maps of 1/128 and 1/64 are used as P3-P7 feature maps finally generated by the feature pyramid network.
The segmentation system is characterized in that: the target detector is an FCOS target detector, and the target detector obtains the category and the position coordinate value of the suggestion box of each feature map pixel by pixel through classification and regression calculation of the P3-P7 feature maps obtained through the feature pyramid fusion network.
The segmentation system is characterized in that: note Fi∈RH*W*CIs the feature at the ith layer of the feature pyramid fusion network, wherein H, W, C represents the height, width and channel number of the feature map respectively, and the real frame provided by the training set is defined as
Figure BDA0002957640990000031
The 4 coordinates respectively represent the abscissa and ordinate of the upper left corner and the abscissa and ordinate of the lower right corner of the real frame, and c represents the category of the real frame; for any position (x, y) on the feature map by the formula:
Figure BDA0002957640990000032
mapping it back to the input image, s being the total step size before this layer, the FCOS head detection network regressing the detection frame directly for each position; if position (x, y) is inside the real box, then it is considered a positive sample, otherwise it is considered a negative sample; FCOS defines a 4-dimensional vector t*=(l*,t*,r*,b*) As a regression target, these four quantities represent the distances of this position to the four sides of the frame, respectively; if one position is in a plurality of suggestion boxes, selecting the suggestion box with the smallest area as a target suggestion box; the last layer of the network of the target detector predicts the 1-dimensional classification label and 4-dimensional frame information of the target suggestion frame, and for classification and regression, 4 convolution layers are added behind the features respectively, and because the regression result is positive, exp (x) is used for de-mapping the true value in the regression branch; the loss function is as follows:
Figure BDA0002957640990000041
wherein L isclsIs the adaptive loss, LregIs the loss of cross-over ratio, NposIs the number of positive samples, λ is 1; FCOS detects objects of different sizes in different layer features, and uses 5 feature layers { P }3,P4,P5,P6,P7Step s of 8, 16, 32, 64, 128, respectively; FCOS adds a single branch to predict the centrality of a location by:
Figure BDA0002957640990000042
BCE loss is adopted during training, and the centrality needs to be multiplied to the score of the classification during the reasoning phase, so that the prediction generated by the position far away from the target center can be restrained.
The segmentation system, wherein the target detector performs Soft NMS operation on the proposed box to obtain a final detection box, comprises:
a. firstly, calculating the intersection and parallel ratio among different suggestion boxes;
b. for a suggestion box with an intersection ratio greater than the threshold, the confidence score is not set to 0 directly, but the score of the box is reduced, as shown in the following formula:
Figure BDA0002957640990000043
wherein, CiRepresenting the suggestion box score of each blade, M is the suggestion box with the highest current score,di is one of the rest suggestion boxes, Ht is a set confidence threshold, iou is an intersection ratio, and the suggestion box with the score larger than a preset value is used as a detection box and output.
The segmentation system wherein the suggestion box performs Soft NMS operations further comprising:
c. if the current score is the highest suggestion box M and the rest suggestion boxes diWhen the intersection ratio of (a) is greater than or equal to the set threshold Ht, a gaussian weighting function is used to replace the lower half of the formula in step b, as shown in the following formula:
Figure BDA0002957640990000051
where σ is the width parameter of the function and D is the domain.
The segmentation system of, wherein: the coefficient predictor is used for carrying out weight prediction on the example information of the detection frame output by the target detector so as to generate an example proportion corresponding to the detection frame.
The segmentation system of, wherein: the semantic segmentation network comprises a first convolution layer, an attention module and a second convolution layer, wherein the first convolution layer carries out feature extraction on a P3-P6 feature map obtained through the feature pyramid fusion network, the attention module further promotes network expression of features extracted by the first convolution layer and outputs the network expression to the second convolution layer, and the second convolution layer generates 4 global segmentation maps after upsampling the output of the attention module.
The segmentation system of, wherein: feature map of a feature pyramid network
Figure BDA0002957640990000052
Inputting the first convolution layer to expand the channel and fully extract the features, thereby generating a new feature map
Figure BDA0002957640990000053
Wherein R is a real number domain, C, H and W respectively represent the channel number, length and width of the characteristic diagram, and then a multi-scale double attention module is applied to sequentially infer the space attention diagram
Figure BDA0002957640990000054
And channel attention map
Figure BDA0002957640990000055
Where R is the real number domain, C, H and W represent the number of channels, length and width of the feature map, respectively, and the whole process can be summarized as follows:
Figure BDA0002957640990000056
wherein M isoutIs the final output of the multi-scale dual attention module,
Figure BDA0002957640990000057
representing the product of the matrix elements.
The segmentation system of, wherein: the multi-scale dual attention module includes a spatial attention module and a channel attention module, the spatial attention module generates two feature descriptors along a channel using global average pooling and maximum pooling:
Figure BDA0002957640990000061
and
Figure BDA0002957640990000062
where R is the real number domain, C, H and W represent the number of channels, length and width of the feature map, respectively, and the spatial attention module concatenates the descriptors and applies the convolutional layer to generate a global spatial attention map
Figure BDA0002957640990000063
Where R is the real number domain, C, H and W represent the number of channels, length and width of the feature map, respectively, and the first convolution kernel of the convolution layer of the spatial attention module is sized to
Figure BDA0002957640990000064
Where R is the real number field, C, H and W represent the number of channels, length and width of the feature map, respectively, where R is the rate of reduction of the channel size, and thus, the global spatial context attention map MS_GThrough the lower partCalculating the formula:
Figure BDA0002957640990000065
wherein
Figure BDA0002957640990000066
Representing the splicing operation, B is batch normalization, δ is the activation function ReLU. Wherein W0And PW17 × 7 layers and point convolution layers respectively, the convolution kernel size of which is respectively
Figure BDA0002957640990000067
And C × H × W;
the spatial attention module performs a convolution operation on each spatial location using point convolution as a local context extractor, the local context
Figure BDA0002957640990000068
Calculated by the following formula:
MS_L(Min)=B(PW1(δ(B(PW0(Min)))))
finally, the feature matrix is combined and output by using broadcast addition, and the feature graph output by the spatial attention module
Figure BDA0002957640990000069
Comprises the following steps:
Figure BDA0002957640990000071
wherein
Figure BDA0002957640990000072
Represents matrix addition, σ is an activation function Sigmoid;
the channel attention module puts the feature map output by the spatial attention module into a global space module of the channel attention module to generate two groups of channel descriptors:
Figure BDA0002957640990000073
and
Figure BDA0002957640990000074
both descriptors are forwarded to a shared multi-layered convolutional subnet to generate a global channel attention map
Figure BDA0002957640990000075
The shared subnet is composed of two point convolutional layers,
Figure BDA0002957640990000076
the channel attention module is inserted into the local branch in parallel and keeps the same architecture as the local space attention, and finally, the feature map of the multi-scale double attention module is summarized and output by using broadcast addition,
Figure BDA0002957640990000077
the specific calculation is as follows:
Figure BDA0002957640990000078
the segmentation system of, wherein: the fusion module superposes the 4 global segmentation graphs with the detection frames, multiplies the instance proportion corresponding to the detection frames by the instance proportion, and then outputs a final segmentation graph, which comprises the following steps:
a. cutting the 4 global segmentation maps by using all the detection frames to obtain the regions of the segmentation maps corresponding to all the detection frames;
b. performing interpolation operation on the cut region to adjust the size of the region to be consistent with the example proportion matrix;
c. multiplying the adjusted region by the proportion of the corresponding example to obtain a segmentation map of each detection frame;
d. and adding and combining the segmentation maps of all the detection frames to generate a final segmentation map.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the overall network architecture;
FIG. 3 is a schematic diagram of a multi-scale dual attention module;
FIG. 4 is a schematic view of a channel attention module;
FIG. 5 is a schematic view of a spatial attention module;
FIG. 6 is a diagram illustrating a split branch operation;
fig. 7 is a graph of the effect of segmentation.
Detailed Description
The following detailed description of the present invention will be made with reference to the accompanying drawings 1-7.
As shown in fig. 2, the present invention provides a segmentation system based on a multi-scale double-attention mechanism and a full convolution neural network, which includes a feature extraction backbone network, a feature pyramid network, a semantic segmentation network, an object detector, a coefficient predictor and a fusion module, wherein the semantic segmentation network includes a first convolution layer, an attention module and a second convolution layer.
As shown in FIG. 1, the leaf segmentation method based on the multi-scale double attention mechanism and the full convolution neural network of the invention comprises the following steps:
(1) a data set provided by a leaf segmentation challenge match (LSC) is obtained and the original available picture is obtained by decompressing the H5 file. The original available pictures include an original RGB map, a label map, a binary map, and a leaf center map.
The primary catalog of the data set includes four folders: a1, A2 and A4 are mainly used for storing delayed images of arabidopsis top views, the number of original RGB pictures is 993, A3 is mainly used for storing delayed images of tobacco top views, and the number of original RGB pictures is 83. The four folders each include both a training set and a test set, wherein the training set includes an original RGB map, a label map, a binary map, and a leaf center map, and the test set includes only the original RGB map and the binary map.
(2) The training set is converted to a COCO _2017 data format for ease of manipulation and processing.
The primary catalog of a dataset in the COCO _2017 format includes two folders: and an angles folder for storing the annotated json file, and a train2017 for storing the original.
(3) And carrying out image enhancement operation on the sample pictures in the training set so as to expand the training samples.
Randomly selecting sample pictures in the training set to perform at least one of the following operations to increase the number of the sample pictures: 1) horizontally turning and vertically turning the image; 2) carrying out affine transformation on the image, wherein the affine transformation comprises translation, scaling and rotation; 3) the image is light-conditioned so that the image becomes darker. Therefore, the generalization capability of the model is improved, the overfitting degree of the model is reduced, and the detection and segmentation precision is effectively improved.
(4) And inputting the extended training set blade images and the extended test set blade images into a feature extraction network. The feature extraction network comprises a feature extraction backbone network and a feature pyramid network. The feature extraction backbone network is a VoVNet57 network.
(5) Features of the training set images and the test set images were extracted using a pre-trained VoVNet57 network, which VoVNet57 network comprised 3 convolutional layers and 4 OSA modules with a tandem order. The input of VovNet57 is RGB original picture, and a set of 128 channels feature map is outputted through 3-layer convolution, and the set of feature map is inputted to the first OSA module, and its output is used as the input of the next OSA module, and sequentially operates in turn, and the output of each OSA module is retained, so the feature map outputted through VovNet57 has four layers in total, the size is 1/4, 1/8, 1/16 and 1/32 of the original picture, and the number of channels is 256, 512, 768 and 1024 respectively. The core modules of the VoVNet57 network are OSA modules, each consisting of 5 convolutional layers, with identical input/output channels, and whose features are aggregated simultaneously to the last layer, each convolutional layer containing bi-directional connections, one connected to the next layer to produce features with a larger receptive field, and the other aggregated only into the final output feature map. Therefore, the problem of information redundancy caused by intensive connection of a traditional feature extraction backbone network can be effectively solved, the feature extraction effect is enhanced, the feature extraction speed and the GPU computing efficiency are improved, model convergence can be accelerated through the pre-trained VoVNet57 network, and model performance is improved.
(6) Inputting a feature map obtained by a VoVNet57 feature extraction backbone network into a feature pyramid network, wherein the feature pyramid network constructs a feature pyramid by utilizing hierarchical semantic features of a convolutional network. The feature map 1/32 output by the last OSA module of the VoVNet57 network is input to the feature pyramid network, the feature pyramid network performs convolution and downsampling operations on the feature map to finally generate a feature map with the size of the original image 1/128, the feature map is progressively upsampled to generate feature maps with the sizes of the original images 1/64, 1/32, 1/16 and 1/8, the feature maps with the sizes of the original images 1/32, 1/16 and 1/8 are fused with feature maps with corresponding sizes generated by the feature extraction backbone network, and the fused feature maps and the feature maps of 1/128 and 1/64 are finally generated layer feature maps. The processing of the feature pyramid network comprises a fusion process of top-down and lateral connection, and the small feature map of the top layer is amplified to the size same as that of the feature map of the previous stage in an upsampling mode, so that the advantage that the strong semantic features of the top layer (which are beneficial to classification) are utilized, the high-resolution information of the bottom layer (which are beneficial to positioning) is utilized, and the upsampling method is realized through nearest neighbor interpolation. In order to combine the high-level semantic features with the accurate positioning capability of the bottom level, a lateral connection structure similar to a residual network is adopted, and the lateral connection fuses the features of the upper level which are subjected to upsampling and have the same resolution as the current level by an addition method to finally obtain a P3-P7 feature map (namely, the feature maps with the sizes respectively being 1/8/, 1/16, 1/32, 1/64 and 1/128 of the original drawings). Therefore, the feature pyramid network can further improve semantic expression through context information, and increase feature mapping resolution, so as to better retain information of small target objects and output features with stronger expression capability.
(7) Inputting the P3-P7 feature maps (i.e. feature maps with the sizes of original images 1/8/, 1/16, 1/32, 1/64 and 1/128) obtained by the feature pyramid fusion network into an FCOS target detector, and obtaining the class of the recommended frame of each feature map by the target detector through classification and regression calculation pixel by pixelAnd distinguishing the coordinate values of the positions. Note Fi∈RH*W*CIs the feature at the ith level of the feature pyramid fusion network, where R represents the real number domain and H, W, C represents the feature map height, width, and channel number, respectively. Defining the real box provided by the training set as
Figure BDA0002957640990000111
The 4 coordinates represent the abscissa and ordinate of the upper left corner and the abscissa and ordinate of the lower right corner of the real frame, respectively, and c represents the category of the real frame. For any position (x, y) on the feature map, the following formula can be used:
Figure BDA0002957640990000112
mapping it back to the input image, s is the total step size before this level, and the FCOS target detector directly regresses the proposed box for each position. If the position (x, y) on the feature map is inside the real box, it is considered a positive sample, otherwise it is considered a negative sample. The FCOS target Detector defines a 4-dimensional vector t*=(l*,t*,r*,b*) As a regression target, these four quantities represent the distance of this position to the four edges of the proposed box, respectively. And if one position is in a plurality of suggestion boxes, selecting the suggestion box with the smallest area as the target suggestion box. Predicting the 1-dimensional classification label and the 4-dimensional frame information of the target suggestion frame by using the last layer of the target detector, wherein the loss function of the target suggestion frame is as follows:
Figure BDA0002957640990000121
wherein L isclsIs the adaptive classification loss, LregIs the loss of cross-over ratio, NposIs the number of positive samples, λ is 1, Σx,yDenotes summing all positions, C* x,yRepresenting which category the suggestion box at that location (x, y) belongs to, f being an indicator function, if C* x,yIf greater than 0, the value is 1, otherwise the value is 0, tx,yAnd t* x,yRespectively representing the positions of the proposed and real boxes, Px,yProbability of belonging to the category, i.e. confidence score of the suggested box at the location, Px,yThe SoftMax function is defined as follows, obtained by the SoftMax function:
Figure BDA0002957640990000122
where e is the exponent, Vi is the weight, Σ, on the ith neuronjMeans that all neurons are summed. FCOS detects objects of different sizes in different layer features, and uses 5 feature layers { P }3,P4,P5,P6,P7With steps s of 8, 16, 32, 64, 128, respectively. FCOS adds a single branch to predict the centrality of a location by:
Figure BDA0002957640990000123
where min is the minimum function, max is the maximum function, l*、r*、t*And b*The distances from the position to the four sides of the frame are respectively expressed, two-classification cross entropy loss is adopted during training, and the centrality needs to be multiplied to the classification score during the inference stage, so that the prediction generated by the position far away from the target center can be restrained.
(8) The target detector performs a Soft NMS operation on the proposed box to obtain a final detection box. In order to ensure the recall rate of object detection, the traditional target detection method usually uses NMS (network management system) for processing, and the main method is to sort the generated suggestion boxes according to confidence scores, reserve the box with the highest score and delete other boxes with the overlapping area of the boxes larger than a certain proportion; however, the NMS processing method, although simple and effective, has a certain problem, the biggest problem is that it forces all the scores of the adjacent suggestion boxes to zero, and if a real object appears in the overlapping area, the detection of the object will fail.
a. Firstly, sorting all suggestion boxes according to the confidence score;
b. selecting a suggestion box with the highest confidence score, and calculating the intersection ratio of the suggestion box and the rest suggestion boxes;
c. the confidence score of the remaining suggestion box is recalculated through the intersection proportion, and the specific method is shown as the following formula:
Figure BDA0002957640990000131
wherein, CiRepresenting the confidence score of the suggestion frame of each blade, wherein M is the suggestion frame with the highest current confidence score, di is one of the rest suggestion frames, Ht is a set confidence threshold value, iou is an intersection ratio, and the suggestion frame with the score larger than a preset value (for example, 0.6) is used as a detection frame and output;
d. when the intersection ratio is smaller than the threshold value, the confidence score of the rest suggestion boxes is unchanged;
e. when the intersection ratio is greater than the threshold, the confidence score of the remaining suggested box is not set to 0 directly, but the score of the box is decreased as follows: if the current suggestion box M and the remaining suggestion box diWhen the intersection ratio of (1) is greater than or equal to the set threshold Ht, the confidence score of the recommended frame linearly decreases. Due to the value C of the formula in step CiNot a continuous function, when a suggestion box diWhen the overlap-to-sum ratio of M exceeds the threshold Ht, the score will jump, and the jump will produce a large fluctuation to the detection result, so a more stable and continuous score resetting function is needed to replace the lower half of the formula in step c, and a gaussian weighting function is used, as shown in the following formula:
Figure BDA0002957640990000132
wherein σ is a width parameter of the function, and D is a domain;
f. sequentially calculating the confidence score of the suggestion frame and the rest suggestion frames;
g. and sequencing the confidence scores of the rest suggestion boxes, and continuously circulating the steps until all the suggestion boxes are completely calculated.
The calculation complexity of Soft NMS is the same as that of NMS, and the score attenuation mode is adopted, so that the recall rate of the model can be effectively improved, and the detection result is improved.
(9) And sending the detection frame to a coefficient predictor, and carrying out weight prediction on the example information of the detection frame by the coefficient predictor so as to generate an example proportion corresponding to the detection frame. The coefficient predictor is a convolutional layer whose output is a 3D structure tensor that can encode instance level information such as the rough shape and pose of an object.
(10) Extracting features of the P3-P6 feature map obtained through the feature pyramid fusion network through a first convolution layer of a semantic segmentation network, further improving the features extracted by the first convolution layer through an attention module, outputting the adjusted feature map to a second convolution layer through the attention module, and upsampling the output of the attention module by the second convolution layer to generate 4 global segmentation maps. In order to better improve the expression capability of the segmentation network and correctly focus on a target object, the invention provides a novel multi-scale double attention module with space and channel descriptors, as shown in fig. 3, a feature map output by a first convolution layer sequentially passes through the space attention module and the channel attention module to generate an attention weight map, and the weight map is subjected to matrix multiplication with an input feature map to generate a final adjusted feature map. This module aggregates the global and local features (as shown in FIG. 2). Feature map of feature pyramid network
Figure BDA0002957640990000141
As an input, where R is the real number field, C, H and W represent the number of channels, length and width of the feature map, respectively. First, the feature map is input into the first convolution layer to expand the channel and fully extract the features, thereby generating a new feature map
Figure BDA0002957640990000142
Wherein R is a real number domain, C, H and W respectively represent the channel number, length and width of the characteristic diagram, and then a multi-scale double attention module is applied to sequentially infer the space attention diagram
Figure BDA0002957640990000151
(where R is the real number field and C, H and W represent the number, length and width of channels of the feature map, respectively) and channel attention maps
Figure BDA0002957640990000152
(where R is the real number field and C, H and W represent the number of channels, length and width, respectively, of the feature map). The whole process can be summarized as follows:
Figure BDA0002957640990000153
wherein M isoutIs the final output of the multi-scale dual attention module, MinIn order to input the characteristic diagram,
Figure BDA0002957640990000154
and expressing the product of matrix elements, wherein Ws is a space attention diagram and Wc is a channel attention diagram. Considering the arrangement sequence of the two modules, an ablation experiment is designed to select the optimal arrangement mode, and through the ablation experiment, the spatial attention module is placed in front of the channel attention module with the highest precision, so that the strategy is selected. .
As shown in fig. 4, the spatial attention module focuses mainly on the dependency of the convolution features on the space and generates a spatial attention matrix to highlight the information-rich areas. In computing spatial attention, the spatial attention module generates two feature descriptors along the path using global average pooling and maximum pooling:
Figure BDA0002957640990000155
and
Figure BDA0002957640990000156
where R is the real number field and C, H and W represent the number of channels, length and width of the feature map, respectively. These feature descriptors indicate features of maximum pooling and average pooling. Next, the spatial attention module multiplies the descriptors by the feature map and applies the convolutional layer to generate a global spatial attention map
Figure BDA0002957640990000157
Wherein R is a real number field, C, H and W respectively represent the channel number and length of the characteristic diagramDegree and width. To reduce the parameters and improve the robustness of the training, the first convolution kernel of the convolutional layer is sized
Figure BDA0002957640990000158
Where R is the real number field and C, H and W represent the number of channels, length and width, respectively, of the feature map, where R is the rate of channel size reduction. Thus, the global spatial context attention map MS_GCan be calculated by the following formula:
Figure BDA0002957640990000159
Figure BDA0002957640990000161
wherein
Figure BDA0002957640990000162
Representing the splicing operation, B is batch normalization, MaxPool is maximum pooling, AvgPool is average pooling, MinFor the input profile, δ is the activation function ReLU, W0And PW17 × 7 layers and point convolution layers respectively, the convolution kernel size of which is respectively
Figure BDA0002957640990000163
And C H W, MLConv is a multilayer convolution.
Parallel local branch modules are added in the spatial attention module to enrich element context and improve multi-scale information expression. In this branch, the inter-attention module performs a convolution operation on each spatial location using a point convolution (convolution kernel size of 1) as a local context extractor. Thus, local context
Figure BDA0002957640990000164
Can be calculated by the following formula:
MS_L(Min)=B(PW1(δ(B(PW0(Min)))))
wherein B is batch normalization, PW0And PW1Respectively 7 by 7 convolutional layers and point convolutional layers, MinTo input the profile, δ is the activation function ReLU. Finally, the feature matrix is combined and output by using broadcast addition, and the feature graph output by the improved spatial attention module
Figure BDA0002957640990000165
Comprises the following steps:
Figure BDA0002957640990000166
wherein
Figure BDA0002957640990000167
Representing a matrix addition of the signals of the first and second,
Figure BDA0002957640990000168
representing the product of matrix elements, σ being the activation function Sigmoid, MinIs an input feature map, CGAnd CLGlobal spatial attention and local spatial attention, respectively.
As shown in FIG. 5, unlike the spatial attention module, the channel attention module is able to capture inter-dependencies between channels and learn inter-channel relationships of features with the goal of using more information to assign higher weights to the channels. To efficiently compute the channel attention map, the feature map output by the spatial attention module is put into the global spatial module of the channel attention module, generating two sets of channel descriptors:
Figure BDA0002957640990000171
and
Figure BDA0002957640990000172
both descriptors are forwarded to a shared multi-layered convolutional subnet to generate a global channel attention map
Figure BDA0002957640990000173
The shared sub-network consisting of two point convolutional layers instead of twoThe full connection layer is specifically calculated as follows:
Figure BDA0002957640990000174
where MLConv is multilayer convolution, MaxPool is maximum pooling, AvgPool is average pooling, Mout’For a characteristic map of the spatial attention module output, PW0And PW1Respectively 7 by 7 convolutional layers and point convolutional layers, MinFor the input profile, B is the batch normalization and δ is the activation function ReLU. Similar to the spatial attention module, we also insert a local branch into the channel attention module in parallel, in this branch, the spatial attention module uses point convolution as a local context extractor to perform convolution operation on each spatial position, finally uses broadcast addition to summarize and output the feature map of the multi-scale double attention module,
Figure BDA0002957640990000175
the specific calculation is as follows:
Figure BDA0002957640990000176
wherein M isout’For the feature map output by the spatial attention module, σ is the activation function Sigmoid, CG and CL are the global channel attention and the local channel attention respectively,
Figure BDA0002957640990000177
the product of the elements of the matrix is represented,
Figure BDA0002957640990000178
representing a matrix addition.
(11) The fusion module superimposes the 4 global segmentation maps with the detection frames, multiplies the superimposed global segmentation maps by the proportion of the instances corresponding to the detection frames, and outputs the final segmentation maps as shown in fig. 6.
a. Cutting the 4 global segmentation maps by using all the detection frames to obtain the regions of the segmentation maps corresponding to all the detection frames;
b. performing interpolation operation on the cut region to adjust the size of the region to be consistent with the example proportion matrix;
c. multiplying the adjusted region by the proportion of the corresponding example to obtain a segmentation map of each detection frame;
d. and adding and combining the segmentation maps of all the detection frames to generate a final segmentation map.
The convergence module itself has translational variability that enables the network to use different activations to accomplish the function of how to distinguish and locate leaves the whole process of steps a-d above can be calculated by:
Figure BDA0002957640990000181
wherein, the propofol is a detection frame, the bases is an area of the detection frame corresponding to the segmentation map, the coefficients is a confidence score of the detection frame, I is a linear interpolation operation, and roiign is an operation of fixing the size of the detection frame.
(12) Training the example segmentation model, and storing the trained model; testing the test data set by using the trained model to realize real-time segmentation of the blade image; iteratively optimizing the model parameters based on the defined loss function until the model converges; FIG. 7 is a diagram illustrating the effect of the trained model on segmenting the pictures in the test set.
The invention can obtain the following beneficial effects:
(1) the invention adopts a stage target detection branch, thus improving the detection speed;
(2) the method utilizes a data enhancement technology comprising operations such as turning, affine transformation, illumination adjustment, light and shade contrast transformation and the like to carry out data enhancement on the training sample, enriches image data, expands the scale of a data set, solves the problem of sample shortage, and simultaneously enhances the robustness and generalization capability of the model;
(3) the invention adopts FPN to extract the characteristics, breaks the trouble of parameter setting of the traditional detection method based on manual extraction of characteristics such as edges, contours, textures and the like;
(4) according to the invention, the automatic segmentation of the blades is realized by using a computer vision technology, compared with manual detection, the labor cost is saved, the production efficiency is improved, and the agricultural unmanned management is realized in a real sense;
(5) the invention provides a novel multi-scale double-attention mechanism which can improve the expression capability of a segmentation network in local and global dimensions;
(6) the invention effectively embeds the attention module into the segmentation network and generates the corresponding position-sensitive segmentation graph, which is beneficial to the distinction between the blades.

Claims (3)

1. A segmentation system based on a multi-scale double-attention mechanism and a full convolution neural network comprises a feature extraction backbone network, a feature pyramid network, a semantic segmentation network, a target detector, a coefficient predictor and a fusion module, and is characterized in that: the feature extraction backbone network is used for extracting features of the training set images and the test set images and sending the features to the feature pyramid network; the characteristic pyramid network is used for carrying out sibling characteristic map fusion to obtain a P3-P7 characteristic map; inputting a P3-P7 feature map obtained through the feature pyramid fusion network into a target detector, and generating a suggested frame category and a position thereof pixel by the target detector to obtain a final detection frame; the coefficient predictor carries out weight prediction on the example information of the detection frame to generate an example proportion corresponding to the detection frame; the semantic segmentation network is used for processing the P3-P6 feature maps obtained by the feature pyramid fusion network to generate 4 segmentation maps; and the fusion module is used for superposing the 4 segmentation maps and the detection frame and multiplying the superposition with the corresponding example proportion so as to output the final segmentation map.
2. The segmentation system of claim 1, wherein: the feature extraction backbone network is the VoVNet57 network.
3. The segmentation system of claim 1, wherein: and the coefficient predictor is used for carrying out weight prediction on the example information of the detection frame output by the target detector.
CN202110230518.4A 2021-03-02 2021-03-02 Leaf segmentation method based on multi-scale double-attention mechanism and full convolution neural network Active CN112837330B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110230518.4A CN112837330B (en) 2021-03-02 2021-03-02 Leaf segmentation method based on multi-scale double-attention mechanism and full convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110230518.4A CN112837330B (en) 2021-03-02 2021-03-02 Leaf segmentation method based on multi-scale double-attention mechanism and full convolution neural network

Publications (2)

Publication Number Publication Date
CN112837330A true CN112837330A (en) 2021-05-25
CN112837330B CN112837330B (en) 2024-05-10

Family

ID=75934347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110230518.4A Active CN112837330B (en) 2021-03-02 2021-03-02 Leaf segmentation method based on multi-scale double-attention mechanism and full convolution neural network

Country Status (1)

Country Link
CN (1) CN112837330B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269139A (en) * 2021-06-18 2021-08-17 中电科大数据研究院有限公司 Self-learning large-scale police officer image classification model aiming at complex scene
CN113379770A (en) * 2021-06-30 2021-09-10 华南理工大学 Nasopharyngeal carcinoma MR image segmentation network construction method, image segmentation method and device
CN113469287A (en) * 2021-07-27 2021-10-01 北京信息科技大学 Spacecraft multi-local component detection method based on instance segmentation network
CN113486930A (en) * 2021-06-18 2021-10-08 陕西大智慧医疗科技股份有限公司 Small intestinal lymphoma segmentation model establishing and segmenting method and device based on improved RetinaNet
CN113486879A (en) * 2021-07-27 2021-10-08 平安科技(深圳)有限公司 Image area suggestion frame detection method, device, equipment and storage medium
CN113538347A (en) * 2021-06-29 2021-10-22 中国电子科技集团公司电子科学研究院 Image detection method and system based on efficient bidirectional path aggregation attention network
CN113658206A (en) * 2021-08-13 2021-11-16 江南大学 Plant leaf segmentation method
CN113674142A (en) * 2021-08-30 2021-11-19 国家计算机网络与信息安全管理中心 Method, device, computer equipment and medium for ablating target object in image
CN113780187A (en) * 2021-09-13 2021-12-10 南京邮电大学 Traffic sign recognition model training method, traffic sign recognition method and device
CN113887455A (en) * 2021-10-11 2022-01-04 东北大学 Face mask detection system and method based on improved FCOS
CN114037833A (en) * 2021-11-18 2022-02-11 桂林电子科技大学 Semantic segmentation method for Miao-nationality clothing image
CN114418999A (en) * 2022-01-20 2022-04-29 哈尔滨工业大学 Retinopathy detection system based on lesion attention pyramid convolution neural network
CN114511576A (en) * 2022-04-19 2022-05-17 山东建筑大学 Image segmentation method and system for scale self-adaptive feature enhanced deep neural network
CN114581670A (en) * 2021-11-25 2022-06-03 哈尔滨工程大学 Ship instance segmentation method based on spatial distribution attention
CN114693930A (en) * 2022-03-31 2022-07-01 福州大学 Example segmentation method and system based on multi-scale features and context attention
CN114913428A (en) * 2022-04-26 2022-08-16 哈尔滨理工大学 Remote sensing image target detection system based on deep learning
CN115661694A (en) * 2022-11-08 2023-01-31 国网湖北省电力有限公司经济技术研究院 Intelligent detection method, system, storage medium and electronic equipment for light-weight main transformer focusing on key characteristics
CN116188479A (en) * 2023-02-21 2023-05-30 北京长木谷医疗科技有限公司 Hip joint image segmentation method and system based on deep learning
CN117152443A (en) * 2023-10-30 2023-12-01 江西云眼视界科技股份有限公司 Image instance segmentation method and system based on semantic lead guidance

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150254843A1 (en) * 2012-09-13 2015-09-10 The Regents Of The University Of California Lung, lobe, and fissure imaging systems and methods
CN111192277A (en) * 2019-12-31 2020-05-22 华为技术有限公司 Instance partitioning method and device
CN112381835A (en) * 2020-10-29 2021-02-19 中国农业大学 Crop leaf segmentation method and device based on convolutional neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150254843A1 (en) * 2012-09-13 2015-09-10 The Regents Of The University Of California Lung, lobe, and fissure imaging systems and methods
CN111192277A (en) * 2019-12-31 2020-05-22 华为技术有限公司 Instance partitioning method and device
CN112381835A (en) * 2020-10-29 2021-02-19 中国农业大学 Crop leaf segmentation method and device based on convolutional neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ASHISH SINHA ET.AL: "Multi-scale self-guided attention for medical image segmentation", 《ARXIV:1906.02849V3 [CS.CV] 》, pages 3 - 6 *
HAO CHEN ET.AL: "BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation", 《ARXIV:2001.00309V3 [CS.CV] 》, pages 1 - 9 *
RUOHAO GUO ET.AL: "LeafMask: Towards Greater Accuracy on Leaf Segmentation", 《ARXIV:2108.03568V1 [CS.CV] 》, pages 1 - 10 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269139A (en) * 2021-06-18 2021-08-17 中电科大数据研究院有限公司 Self-learning large-scale police officer image classification model aiming at complex scene
CN113269139B (en) * 2021-06-18 2023-09-26 中电科大数据研究院有限公司 Self-learning large-scale police officer image classification model for complex scene
CN113486930B (en) * 2021-06-18 2024-04-16 陕西大智慧医疗科技股份有限公司 Method and device for establishing and segmenting small intestine lymphoma segmentation model based on improved RetinaNet
CN113486930A (en) * 2021-06-18 2021-10-08 陕西大智慧医疗科技股份有限公司 Small intestinal lymphoma segmentation model establishing and segmenting method and device based on improved RetinaNet
CN113538347A (en) * 2021-06-29 2021-10-22 中国电子科技集团公司电子科学研究院 Image detection method and system based on efficient bidirectional path aggregation attention network
CN113538347B (en) * 2021-06-29 2023-10-27 中国电子科技集团公司电子科学研究院 Image detection method and system based on efficient bidirectional path aggregation attention network
CN113379770A (en) * 2021-06-30 2021-09-10 华南理工大学 Nasopharyngeal carcinoma MR image segmentation network construction method, image segmentation method and device
CN113486879B (en) * 2021-07-27 2024-03-05 平安科技(深圳)有限公司 Image area suggestion frame detection method, device, equipment and storage medium
CN113486879A (en) * 2021-07-27 2021-10-08 平安科技(深圳)有限公司 Image area suggestion frame detection method, device, equipment and storage medium
CN113469287A (en) * 2021-07-27 2021-10-01 北京信息科技大学 Spacecraft multi-local component detection method based on instance segmentation network
CN113658206A (en) * 2021-08-13 2021-11-16 江南大学 Plant leaf segmentation method
CN113658206B (en) * 2021-08-13 2024-04-09 江南大学 Plant leaf segmentation method
CN113674142A (en) * 2021-08-30 2021-11-19 国家计算机网络与信息安全管理中心 Method, device, computer equipment and medium for ablating target object in image
CN113674142B (en) * 2021-08-30 2023-10-17 国家计算机网络与信息安全管理中心 Method and device for ablating target object in image, computer equipment and medium
CN113780187A (en) * 2021-09-13 2021-12-10 南京邮电大学 Traffic sign recognition model training method, traffic sign recognition method and device
CN113887455A (en) * 2021-10-11 2022-01-04 东北大学 Face mask detection system and method based on improved FCOS
CN113887455B (en) * 2021-10-11 2024-05-28 东北大学 Face mask detection system and method based on improved FCOS
CN114037833B (en) * 2021-11-18 2024-03-19 桂林电子科技大学 Semantic segmentation method for image of germchit costume
CN114037833A (en) * 2021-11-18 2022-02-11 桂林电子科技大学 Semantic segmentation method for Miao-nationality clothing image
CN114581670A (en) * 2021-11-25 2022-06-03 哈尔滨工程大学 Ship instance segmentation method based on spatial distribution attention
CN114418999A (en) * 2022-01-20 2022-04-29 哈尔滨工业大学 Retinopathy detection system based on lesion attention pyramid convolution neural network
CN114693930A (en) * 2022-03-31 2022-07-01 福州大学 Example segmentation method and system based on multi-scale features and context attention
CN114511576A (en) * 2022-04-19 2022-05-17 山东建筑大学 Image segmentation method and system for scale self-adaptive feature enhanced deep neural network
CN114913428A (en) * 2022-04-26 2022-08-16 哈尔滨理工大学 Remote sensing image target detection system based on deep learning
CN115661694A (en) * 2022-11-08 2023-01-31 国网湖北省电力有限公司经济技术研究院 Intelligent detection method, system, storage medium and electronic equipment for light-weight main transformer focusing on key characteristics
CN115661694B (en) * 2022-11-08 2024-05-28 国网湖北省电力有限公司经济技术研究院 Intelligent detection method and system for light-weight main transformer with focusing key characteristics, storage medium and electronic equipment
CN116188479A (en) * 2023-02-21 2023-05-30 北京长木谷医疗科技有限公司 Hip joint image segmentation method and system based on deep learning
CN116188479B (en) * 2023-02-21 2024-04-02 北京长木谷医疗科技股份有限公司 Hip joint image segmentation method and system based on deep learning
CN117152443B (en) * 2023-10-30 2024-02-23 江西云眼视界科技股份有限公司 Image instance segmentation method and system based on semantic lead guidance
CN117152443A (en) * 2023-10-30 2023-12-01 江西云眼视界科技股份有限公司 Image instance segmentation method and system based on semantic lead guidance

Also Published As

Publication number Publication date
CN112837330B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
CN112837330A (en) Leaf segmentation method based on multi-scale double attention mechanism and full convolution neural network
CN110428428B (en) Image semantic segmentation method, electronic equipment and readable storage medium
CN109919108B (en) Remote sensing image rapid target detection method based on deep hash auxiliary network
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN108830326B (en) Automatic segmentation method and device for MRI (magnetic resonance imaging) image
CN108986058B (en) Image fusion method for brightness consistency learning
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN111797779A (en) Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
CN113627228B (en) Lane line detection method based on key point regression and multi-scale feature fusion
CN110533022B (en) Target detection method, system, device and storage medium
CN112541508A (en) Fruit segmentation and recognition method and system and fruit picking robot
CN112381764A (en) Crop disease and insect pest detection method
CN112541904A (en) Unsupervised remote sensing image change detection method, storage medium and computing device
CN113706581B (en) Target tracking method based on residual channel attention and multi-level classification regression
CN107506792B (en) Semi-supervised salient object detection method
CN111160407A (en) Deep learning target detection method and system
CN113657560A (en) Weak supervision image semantic segmentation method and system based on node classification
CN112749675A (en) Potato disease identification method based on convolutional neural network
CN115564983A (en) Target detection method and device, electronic equipment, storage medium and application thereof
CN112380917A (en) A unmanned aerial vehicle for crops plant diseases and insect pests detect
Chimakurthi Application of convolution neural network for digital image processing
CN115222998A (en) Image classification method
CN117576079A (en) Industrial product surface abnormality detection method, device and system
CN112418358A (en) Vehicle multi-attribute classification method for strengthening deep fusion network
CN115330759B (en) Method and device for calculating distance loss based on Hausdorff distance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant