CN111461110A - Small target detection method based on multi-scale image and weighted fusion loss - Google Patents

Small target detection method based on multi-scale image and weighted fusion loss Download PDF

Info

Publication number
CN111461110A
CN111461110A CN202010134062.7A CN202010134062A CN111461110A CN 111461110 A CN111461110 A CN 111461110A CN 202010134062 A CN202010134062 A CN 202010134062A CN 111461110 A CN111461110 A CN 111461110A
Authority
CN
China
Prior art keywords
image
convolution
feature
layer
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010134062.7A
Other languages
Chinese (zh)
Other versions
CN111461110B (en
Inventor
林坤阳
罗家祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010134062.7A priority Critical patent/CN111461110B/en
Publication of CN111461110A publication Critical patent/CN111461110A/en
Application granted granted Critical
Publication of CN111461110B publication Critical patent/CN111461110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of image and video processing, and relates to a small target detection method based on multi-scale images and weighted fusion loss, which comprises the following steps: extracting a plurality of groups of feature vectors from images with different scales based on an improved Mask RCNN model, fusing the plurality of groups of feature vectors, and constructing a feature pyramid; generating candidate detection frames based on the characteristic pyramid and screening to obtain suggested detection frames; corresponding the suggested detection frames back to the feature graphs of the feature pyramids, and aligning and intercepting the feature graphs; inputting the aligned suggested detection frames into a classifier layer to obtain class confidence and position offset of the suggested detection frames; in the testing stage, screening certain suggested detection frames according to the category confidence score of the suggested detection frames, and performing non-maximum value inhibition; in the training stage, weighting is carried out on the loss function calculated by the small target detection characteristic layer, and the loss function is fused with the loss functions of the large and medium target detection layers, so that the sensitivity of the model to small target objects is enhanced.

Description

Small target detection method based on multi-scale image and weighted fusion loss
Technical Field
The invention belongs to the field of image and video processing, and relates to a small target detection method based on multi-scale images and weighted fusion loss.
Background
With the development of machine learning and deep learning, the pattern recognition and computer vision fields get unprecedented attention and popularity depending on the powerful learning ability of the convolutional neural network. In the era of wide popularization of machine automation and artificial intelligence, the role played by a camera is increasingly equal to that of human eyes, and the development of the field of computer vision is particularly important and has received wide attention from the industrial and academic circles. Among them, target detection is a remarkable and ongoing advance in the field of computer vision. However, most of the target objects in the pictures and videos appear in extremely minute forms. Usually, many objects occupy very low pixels in one frame of picture, and most of the pixels are less than 49px, so the task of detecting tiny objects is difficult and very important.
The difficulty of small target detection is scale, and the target with each scale size cannot be usually taken care of by inputting the feature information into the network model with a single size to extract feature information. Although the existing Mask RCNN model has good effect on target detection, the problems of single input image scale, uncertain resolution, insufficient utilization of context information, insensitivity to small target object detection and the like still exist.
Disclosure of Invention
Aiming at the defects of the traditional Mask RCNN model, the invention provides a small target detection method based on multi-scale images and weighted fusion loss.
The invention is realized by adopting the following technical scheme:
a small target detection method based on multi-scale images and weighted fusion loss is realized based on an improved Mask RCNN model and comprises the following steps:
s1, building an improved Mask RCNN model; the improved Mask RCNN model comprises: the system comprises a residual backbone network, a characteristic pyramid network layer, an area generation network layer, an interested frame alignment layer, a classifier layer, a loss function calculation layer and a test layer;
s2, constructing an image pyramid: carrying out scaling processing on the original image, and forming an image pyramid by the original image, the image with the reduced size and the image with the enlarged size;
s3, randomly cutting the image in the image pyramid;
s4, sending the randomly cut images into a residual backbone network for convolution, batch normalization and pooling, and outputting a plurality of groups of feature maps with different sizes;
s5, fusing a plurality of groups of feature maps with different scales, and further processing to obtain feature maps P2-P6;
s6, generating candidate detection frames which are not screened for the feature maps P2-P6 respectively;
s7, inputting the feature map P2-P6 into an area to generate a network layer, and obtaining the offset and confidence of the candidate detection frame through a series of convolution operations;
s8, combining the offset of the candidate detection frame in the S7 and the data of the candidate detection frame which is not screened and obtained in the S6, and screening the candidate detection frame with a set amount as the interested detection frame;
s9, respectively corresponding the interested detection frames to the feature maps P2-P6, and carrying out alignment operation;
s10, inputting the result of the alignment operation into a classification layer, and outputting the predicted category score, the category probability and the coordinate offset of the interested detection frame;
s11, inputting the predicted interested detection frame category score, the category probability and the coordinate offset into a test layer, screening the maximum value of the category probability in the test layer, selecting the predicted target category corresponding to the interested detection frame, further inhibiting and filtering redundant interested detection frames through the non-maximum value, and finally obtaining the finally predicted interested detection frame and the corresponding predicted target category in the test layer.
Further, the training phase also comprises:
s12, inputting the classification scores of the interested detection boxes predicted in S10 into a loss function calculation layer, and taking the classification scores and the actual classification labels as the input of a cross entropy function to calculate a classification loss value so as to obtain the classification prediction loss of the feature maps P2-P6;
the coordinate offset of the interested detection frame predicted in the S10 and the offset of the real target frame are taken as the input of a regression loss function, so that the regression prediction loss of the characteristic map P2-P6 is obtained;
s13, weighting the category prediction losses of the feature map P2 and the feature map P3 respectively, and adding the category prediction losses of the feature map P4, the feature map P5 and the feature map P6 to obtain a total category prediction loss;
weighting the regression prediction losses of the feature map P2 and the feature map P3 respectively, and adding the weighted regression prediction losses of the feature map P4, the feature map P5 and the feature map P6 to obtain a total regression prediction loss;
and S14, iteratively updating parameters and weights of the improved Mask RCNN model through back propagation, specifically, respectively utilizing total class prediction loss and total regression prediction loss, and performing optimization iteration and changing the weight value of the improved Mask RCNN model.
Further, the improved Mask RCNN model comprises the following steps:
①, aligning the interested detection frames, not aligning uniformly, but aligning different feature layers separately, after aligning, not directly fusing the transmitted loss function calculation layers, but inputting the input loss function calculation layers to carry out classification and regression respectively, finally inputting the input loss function calculation layers separately, weighting the loss functions calculated by the small target feature layer, and fusing the weighted loss functions with the loss functions of the large and medium target layers;
②, adding an effective characteristic layer P6 in the original Mask RCNN model;
③, removing the image segmentation module in the original Mask RCNN and canceling the Mask branch.
Preferably, the scaling process performed on the original image in S2 includes:
the formula for scaling pictures is expressed as:
Image_New=Image*scale (1)
wherein: image _ New represents a zoomed picture, Image represents a picture before zooming, and scale represents a zooming scale;
the scale is determined by the following factors:
if the length of the minimum edge after the scaling is finished cannot be smaller than min _ dim, min () represents the minimum value operation, h represents the height of the original image, w represents the width of the original image, when min _ dim is larger than min (h, w),
scale=min_dim/min(h,w) (2)
otherwise scale is 1;
if the length of the longest edge after the scaling is finished is max _ dim, if the picture is scaled according to equation (2), and if the longest edge of the scaled picture exceeds max _ dim, the following steps are performed:
scale=max_dim/image_max (3)
otherwise, continuously scaling according to scale min _ dim/min (h, w);
the size of the final zoomed picture is max _ dim × max _ dim, and in addition, if the scale of the final zooming is larger than 1, the original picture is magnified by a bilinear interpolation method; for the part of the picture after the last scaling which is less than max _ dim, zero values are used to fill in the pixel values.
Preferably, the formula for randomly cropping the picture in S3 is expressed as follows:
Y1=randi([0,image_size(1)-crop_size(1)]) (4)
X1=randi([0,image_size(2)-crop_size(2)]) (5)
wherein: y1 and X1 represent the lower left-hand ordinate and lower left-hand abscissa, respectively, at which cropping of the picture begins; randi represents random access, and the access range is the range inside the small brackets; image _ size is the size of the picture before clipping, the width of the first dimension storing the picture, and the length of the second dimension storing the picture; crop _ size is the size of the area to be cut, the width of the first-dimension storage area and the length of the second-dimension storage area;
Y2=min(image_size(1),Y1+crop_size(1)) (6)
X2=min(image_size(2),X1+crop_size(2)) (7)
wherein: y2 and X2 respectively represent the ordinate and abscissa of the upper right corner at the start of cropping; randi represents random access; min () represents taking the minimum value;
the specific position of the cropping is determined by using the two coordinates obtained by the formulas (4) and (7), and if the cropping area overflows from the original image, pad filling is carried out to obtain the cropped image.
Preferably, the convolution of the residual backbone network comprises two convolution modules, block1 and block2, wherein:
the convolution module block1 workflow comprises:
①, for Branch 1, the output and input remain consistent;
②, for branch 2, sequentially using 1 × 1 convolution kernel, 3 × 3 convolution kernel and 1 × 1 convolution kernel to perform convolution operation, and performing mean value normalization on the output feature vector after each convolution is completed;
the convolution module block2 workflow comprises:
①, for branch 1, performing convolution operation by using 1 × 1 convolution kernel, and then performing mean value normalization on the output feature vector;
②, for branch 2, convolution operation is performed by using 1 × 1 convolution kernel, 3 × 3 convolution kernel and 1 × 1 convolution kernel in sequence, and mean normalization is performed on the output feature vector after each convolution is completed.
Preferably, outputting a plurality of sets of feature maps of different sizes in S4 includes:
for original input, five-layer feature map output is constructed: c2, C3, C4, C5, C6; for inputs that are doubled relative to the original, five-layer feature map outputs are constructed: c2s, C3s, C4s, C5s, C6 s; for an input that is enlarged by one time relative to the original, five-layer feature map outputs are constructed: c2l, C3l, C4l, C5l, C6 l.
Preferably, step S5 includes:
s51, performing interpolation principle-based upsampling on C2S-C6S, and doubling C2S-C6S;
s52, performing maximum pooling on C2l-C6l, and doubling C2l-C6 l;
s53, adding C2-C6, C2S-C6S which is enlarged by one time and C2l-C6l which is reduced by one time to obtain C2-C6 which is fused with image features of different scales;
s54, further processing the C2-C6 fused with the image features of different scales to obtain feature maps P2-P6.
Preferably, S54 includes:
using 256 convolution checks of 1 × 1 to convolve C6 fused with image features of different scales to obtain a feature map P6 with the output of 16 × 256;
c5 fused with image features of different scales is convoluted by 256 convolution checks of 1 × 1, the convoluted result is added with the output obtained by twice upsampling P6, and then the convolution is carried out by 3 × 3 to obtain a feature map P5 with the output of 32 × 256;
c4 fused with image features of different scales is convoluted by 256 convolution checks of 1 × 1, the convoluted result is added with the output obtained by twice upsampling the P5, and then the convolution is carried out by 3 × 3 to obtain a feature map P4 with the output of 64 × 256;
c3 fused with image features of different scales is convoluted by 256 convolution checks of 1 × 1, the convoluted result is added with the output obtained by twice upsampling the P4, and then the convolution is carried out by 3 × 3 to obtain a feature map P3 with the output of 128 × 256;
c2 fused with image features of different scales is convolved by 256 convolution checks of 1 × 1, then added to the output obtained by up-sampling twice the P3, and further convolved by 3 × 3 to obtain a feature map P2 with an output of 256 × 256.
Preferably, the height h and the width w of the candidate detection frame are:
Figure BDA0002396690390000051
Figure BDA0002396690390000052
wherein: scale _ length indicates that when the candidate detection frame is a frame having a height equal to a width, the height and the width correspond to the pixel level size of the original image.
Compared with the existing Mask RCNN model small target detection, the method has the following beneficial effects:
(1) a method for constructing an image pyramid as Mask RCNN model input is provided, one image is preprocessed to be changed into multiple scales as input instead of a single scale, and the capability of extracting the characteristics of a small target object is enhanced.
(2) An image segmentation module in the original Mask RCNN model is removed, a Mask branch is eliminated, network parameters are reduced, and the training classification and regression part is more efficient.
(3) The alignment of the interested regions is not aligned uniformly any more, but different feature layers are aligned separately, then the different feature layers are input into the classification layers for classification and regression respectively, finally the input loss function calculation layers are separated, the loss functions calculated by the small target feature layer are weighted and fused with the loss functions of the large and medium target layers, the influence of the small target object on the model loss function is enhanced, and the model can learn the features of the small target object better.
(4) A layer of effective characteristic layer P6 is added in the original Mask RCNN model, so that the detection precision of a small target object is improved, the detection precision of a large object is ensured not to be reduced, and the detection accuracy and precision of the small target object in target detection are improved.
Drawings
FIG. 1 is a diagram of an improved Mask RCNN model architecture in accordance with an embodiment of the present invention;
FIG. 2 is a diagram illustrating the volume blocks in the modified Mask RCNN model according to an embodiment of the present invention;
fig. 3 is a flow chart of implementation of small target detection in an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific examples.
A small target detection method based on multi-scale images and weighted fusion loss is realized based on an improved Mask RCNN model and comprises the following steps:
and S1, constructing an improved Mask RCNN model.
In a preferred embodiment, the improved Mask RCNN model includes a backbone network part, a candidate window generation part and a classification layer part, and is built by using a keras platform, including: the system comprises a residual backbone network, a characteristic pyramid network layer, an area suggestion layer, an interested frame alignment layer, a classifier layer, a loss function calculation layer and a test layer. Compared with the original Mask RCNN, the improvement comprises the following steps:
①, aligning the regions of interest, aligning different feature layers separately, not directly fusing the incoming loss functions after aligning, but inputting the loss functions into the classifier layers for classification and regression respectively, finally separating the input loss function calculation layers, weighting the loss functions calculated by the small target feature layer, fusing the loss functions with the loss functions of the large and medium target layers, enhancing the influence of the small target object on the model loss function, and enabling the model to learn the features of the small target object better.
②, adding an effective characteristic layer P6 in the original Mask RCNN model, so that the detection precision of the small target object is improved, the detection precision of the large object is not reduced, and the detection accuracy and precision of the small target object in the target detection are improved.
③, removing the image segmentation module in the original Mask RCNN, and canceling the Mask branch, thereby reducing the network parameters and making the training classification and regression part more efficient.
And S2, constructing an image pyramid.
And respectively carrying out image size reduction and image size amplification on the original image data by one time, and reserving the original size image data, wherein the original image, the reduced image and the amplified image form an image pyramid together.
The first step of changing the size of the picture is to zoom the picture, and the formula for zooming the picture is expressed as:
Image_New=Image*scale (1)
wherein: image _ New represents the zoomed picture, Image represents the picture before zooming, and scale represents the zoom scale. The scale is determined by the following factors:
if the length of the minimum edge after the scaling is finished cannot be smaller than min _ dim, min () represents the minimum value operation, h represents the height of the original image, w represents the width of the original image, when min _ dim is larger than min (h, w),
scale=min_dim/min(h,w) (2)
otherwise scale is 1.
If the length of the longest edge after the scaling is finished is max _ dim, if the picture is scaled according to equation (2), and if the longest edge of the scaled picture exceeds max _ dim, the following steps are performed:
scale=max_dim/image_max (3)
otherwise, continue scaling by scale min _ dim/min (h, w).
And the size of the final zoomed picture is max _ dim × max _ dim, and in addition, if the scale of the final zooming is larger than 1, namely the original picture is zoomed in, the original picture is zoomed in by a bilinear interpolation method. For the part of the picture after the last scaling which is less than max _ dim, zero values are used to fill in the pixel values.
S3, the original image, the reduced image, and the enlarged image in the image pyramid are randomly cropped 512 × 512 to be input as a primary training.
The formula for randomly cropping the picture is expressed as follows:
Y1=randi([0,image_size(1)-crop_size(1)]) (4)
X1=randi([0,image_size(2)-crop_size(2)]) (5)
wherein: y1 and X1 represent the lower left-hand ordinate and lower left-hand abscissa, respectively, at which cropping of the picture begins; randi represents random access, and the access range is the range inside the small brackets; image _ size is the size of the picture before clipping, the width of the first dimension storing the picture, and the length of the second dimension storing the picture; crop _ size is the size of the area to be cut, the width of the first dimension storage area, and the length of the second dimension storage area.
Y2=min(image_size(1),Y1+crop_size(1)) (6)
X2=min(image_size(2),X1+crop_size(2)) (7)
Wherein: y2 and X2 respectively represent the ordinate and abscissa of the upper right corner at the start of cropping; randi represents a random number, the number range is the range inside the small brackets, min () represents the minimum value, and the two numbers to be compared are inside the small brackets.
And determining the specific position of the cutting by using the two obtained coordinates, and if the cutting area overflows from the original image, performing pad filling to obtain the image after cutting. pad padding is to zero-pad the overflow area in three channels of pixels, i.e. each channel is assigned a value of 0.
And S4, sending the cut image into a residual backbone network for convolution, batch normalization and pooling, and outputting three groups of feature maps with different sizes.
In a preferred embodiment, the residual backbone network performs a total of 60 convolutions, a maximum pooling at the beginning and an average pooling at the end, with a batch normalization after each convolution. And utilizing a specific convolution module to form five-layer characteristic diagram output for original image input according to the size relation of the characteristic diagram after convolution: c2, C3, C4, C5, C6; for inputs that are doubled relative to the original, five-layer feature map outputs are constructed: c2s, C3s, C4s, C5s, C6 s; for an input that is enlarged by one time relative to the original, five-layer feature map outputs are constructed: c2l, C3l, C4l, C5l, C6 l.
The convolution formula is expressed as follows:
Figure BDA0002396690390000081
where Output is the value of each point in the feature map of the convolution Output, wi,jIs the weight of the n x n magnitude convolution kernel at the (i, j) location; inputi',j'Is the pixel value of the map of the input convolution kernel at the location corresponding to the convolution kernel location (i, j).
For the max pooling operation, a kernel of size 3 × 3 is selected, and the kernel is slid in the input map in steps of size 2, and the maximum value in the slid 3 × 3 local receiving domain is selected as the value of the corresponding point of the output map, and the formula is expressed as:
Outputmax=max(Areainput) (9)
wherein: areainputRepresenting the values of all the pixels in the local reception domain.
For the average value pooling operation, the values of all the pixel points of the input characteristic diagram are summed and then averaged, and finally the output of 1 x1 is obtained without changing the number of channels. The formula is expressed as follows:
Figure BDA0002396690390000082
wherein: inputi,jIs the pixel value of the input graph at pixel point (i, j), and n x n refers to the size of the input feature graph.
Batch normalization is to accelerate network convergence when the value distribution of each point of a convolution graph output by each hidden layer of the deep neural network is converted into a normal distribution numerical value which takes 0 as a mean value and has unit variance. For a batch of inputs, assuming n samples, the output at a certain hidden layer I is: { z(1),z(2),z(3),z(4),...,z(n)-averaging the batch of outputs:
Figure BDA0002396690390000091
and (4) solving the variance:
Figure BDA0002396690390000092
the output is normalized (batch normalization), i.e. the output of each feed sample is subjected to the following operations:
Figure BDA0002396690390000093
∈ is provided to prevent invalid calculations when the variance is 0.
Based on the deficiency, the invention introduces two learnable parameters gamma and β, thereby carrying out the following operations on the obtained batch normalized output:
Figure BDA0002396690390000094
thereby restoring the expressive power of the data itself.
In a preferred embodiment, the activation function of the improved Mask RCNN model is unified by a Sigmoid function, and the mathematical expression of the Sigmoid function is as follows:
Figure BDA0002396690390000095
s5, fusing three groups of feature maps C2S-C6S, C2-C6 and C2l-C6l with different scales obtained in S4, and further obtaining feature maps P2-P6. The specific operation mode is as follows:
②, performing upsampling on C2s-C6s based on an interpolation principle, and doubling C2s-C6 s;
②, performing maximum pooling on C2l-C6l, and doubling C2l-C6 l;
the principle of maximum pooling is as follows: selecting a kernel with the size of 3 x 3, sliding in the input graph by the step size of 2, selecting the maximum value in the sliding 3 x 3 area as the value of the corresponding point of the output graph, and expressing the formula as follows:
Outputmax=max(Areainput) (16)
wherein: areainputRepresenting the values of all the pixels in the local reception domain.
③, adding C2-C6, C2s-C6s which is enlarged by one time and C2l-C6l which is reduced by one time to obtain C2-C6 with fused image features of different scales.
C2-C6 at this time is different from C2-C6 in S4, and C2-C6 at this time fuses the features extracted from the images with different scales.
④, and further processing the C2-C6 fused with the image features of different scales to obtain feature maps P2-P6.
Specifically, the method comprises the following steps: the C6 fused with the image features of different scales was convolved with 256 1 × 1 convolution checks to obtain a feature map P6 with an output of 16 × 256.
C5 fused with image features of different scales is convoluted by 256 convolution checks of 1 × 1, and then added with the output obtained by up-sampling twice the P6, and then the convolution is performed by 3 × 3 to obtain a feature map P5 with the output of 32 × 256.
C4 fused with image features of different scales is convolved by 256 convolution checks of 1 × 1, and then added to the output obtained by up-sampling twice the P5, and then 3 × 3 convolution is performed to obtain a feature map P4 with an output of 64 × 256.
C3 fused with image features of different scales is convolved by 256 convolution checks of 1 × 1, then added to the output obtained by up-sampling twice the P4, and then convolved by 3 × 3 to obtain a feature map P3 with an output of 128 × 256.
C2 fused with image features of different scales is convolved by 256 convolution checks of 1 × 1, then added to the output obtained by up-sampling twice the P3, and further convolved by 3 × 3 to obtain a feature map P2 with an output of 256 × 256.
The above obtained P2-P6 are combined together to form a feature map matrix.
And S6, generating an unscreened candidate detection frame on the feature map P2-P6 obtained in S5.
The height h and width w of the candidate detection frame are as follows:
Figure BDA0002396690390000101
Figure BDA0002396690390000102
wherein: scale _ length indicates that when the candidate detection frame is a frame having a height equal to a width, the height and the width correspond to the pixel level size of the original image. For P2 to P6, 32, 64, 128, 256, 512, respectively. The ratios represent three dimensions for each size candidate detection box: 0.5, 1 and 2.
And generating candidate detection frames with different sizes and different scales by taking the pixel points as centers for the feature maps P2-P6, wherein each pixel point on the feature maps generates a candidate detection frame. And the candidate detection frame center coordinates are each pixel point of each layer of feature map.
S7, feature maps P2 to P6 obtained in S5 are input to RPN (Region pro-polar Network, Region-generated Network layer), and feature maps P2 to P6 are convolved with the same convolution layer, without changing the feature map size. And then, convolving the feature map after the convolution by using a convolution kernel of 1 × 1 to obtain rpn _ score, and then performing softmax operation, outputting candidate detection frame confidence rpn _ pro with the number of channels (2 × the number of candidate detection frames of each pixel), and then convolving by using the convolution kernel of 1 × 1 on the basis, and outputting candidate detection frame offset information rpn _ bbox with the number of channels (4 × the number of candidate detection frames of each pixel which are not screened).
S8, combining rpn _ bbox of S7 with the data of candidate test frames obtained from S6 to generate a final test frame of interest (also called ROI frame or suggested test frame) rpn box (complete candidate test frame after screening), and obtaining a score of test frame of interest from rpn _ score obtained from S7, and ranking the score of test frame of interest from large to small for non-maximum suppression, and finally retaining the first 1500 test frames of interest (rpn box).
And S9, performing alignment operation on the 1500 interested detection frames obtained in the S8 respectively corresponding to the P2-P6 feature maps.
The average value of the target size of the P5 feature layer is 224 × 224, and the field of the P5 feature layer is 32 with respect to the original image in the alignment, so that the P3578 feature layer should be aligned to a size of 7 × 7, which is (224/32 ═ 7) × (224/32 ═ 7). The receptive fields from P2 to P6 layers increased by a factor of 2, and so on, P2, P3, P4 and P6 would also align to 7 x 7 sizes. The output after the alignment is finished is respectively: align _ p2, align _ p3, align _ p4, align _ p5, align _ p 6.
S10, the output signals of align _ p2, align _ p3, align _ p4, align _ p5 and align _ p6 after the alignment in S9 are input into a classification layer, the align _ p2, align _ p3, align _ p4, align _ p5 and align _ p6 are converted into vectors by convolution operation, and the class score, the class probability and the coordinate offset of the prediction interest detection frame are output.
Specifically, the vector (x, 256, 7, 7) is converted into data of (x, 256, 1, 1) by convolution with 7 × 7, and the vector (x, 256, 1, 1) is converted into (x, 81) by full concatenation, and the output is a category score, wherein 81 represents the ROI box 81 categories; performing softmax operation on the vector (x, 81), and outputting the probability as a category probability; then, using the full join operation, (x, 256, 1, 1) is converted to (x, 81 x 4), where 4 represents 4 coordinate offsets of 81 categories of the ROI box. Note that "x" above indicates the number of detection boxes of interest for each layer of the feature map, and the value of "x" may be different in size at different layers.
And S11, in the testing stage, after S10, inputting the category score, the category probability and the coordinate offset into a testing layer, wherein the maximum value of the category probability is selected in the testing layer, the target category corresponding to the prediction of the interested detection frame is selected, and the non-maximum value is further carried out by using a threshold value with the size of 0.7 to inhibit and filter redundant interested detection frames. And finally, obtaining a final predicted interested detection frame and a corresponding predicted target category in the test layer.
In the training phase, the method further comprises the following steps:
and weighting the loss function calculated by the small target detection feature layer, fusing the loss function with the loss functions of the large and medium target detection layers, enhancing the influence of the small target object on the model loss function, and enabling the improved Mask RCNN model to better learn the features of the small target object.
Specifically, the method comprises the following steps:
and S12, inputting the 81 category scores output by each interested detection box in each layer feature map P2-P6 in S10 into a loss function calculation layer, and using the loss function calculation layer and the actual category labels as the input of a cross entropy function to calculate a classification loss value to obtain category prediction loss. Wherein the cross entropy function is expressed as:
Figure BDA0002396690390000121
wherein: y'iFor the ith value in the true category label, c represents the total number of category labels, yiFor predicting the corresponding value in the vector after the category is subjected to softmax normalization, the more accurate the classification is, the more accurate yiThe closer to 1, the loss function value L ossy′The smaller (y) is.
And 4 coordinate offsets of 81 categories output by each interested detection frame of each layer feature map in the S10 are used as the input of a regression loss function smooth _ l1 together with the actual target frame offset to obtain the regression prediction loss. The function is expressed as:
Figure BDA0002396690390000122
wherein: x represents the difference between the predicted detection frame of interest offset coordinates and the true target frame offset coordinates.
S13, weighting the class prediction losses L os _ class _ P2 and L os _ class _ P3 of the P2 and P3 layers in S12, respectively, and adding L os _ class _ P4, L os _ class _ P5 and L os _ class _ P6 to obtain the total class prediction loss:
Loss_class_P2'=4*Loss_class_P2
Loss_class_P3'=2*Loss_class_P3
Loss_class=Loss_class_P2'+Loss_class_P3'+Loss_class_P4+Loss_class_P5+Loss_class_P6
(21)
the regression prediction losses L os _ reg _ P2 and L os _ reg _ P3 of the P2 and P3 layers output in S12 are weighted and added to L os _ reg _ P4, L os _ reg _ P5 and L os _ reg _ P6 to obtain the regression prediction losses:
Loss_reg_P2'=4*Loss_reg_P2
Loss_reg_P3'=2*Loss_reg_P3
Loss_reg=Loss_reg_P2'+Loss_reg_P3'+Loss_reg_P4+Loss_reg_P5+Loss_reg_P6
(22)
finally, L os _ class and L os _ reg are respectively utilized to carry out optimization iteration and modify the weight value of the improved MaskRCNN model, so that the learning purpose of the improved Mask RCNN model is achieved.
The present invention will be described in further detail with reference to the accompanying drawings.
Referring to fig. 1-3, a small target detection method based on multi-scale images and weighted fusion loss includes:
(1) and performing feature extraction on the original image and the image which is enlarged and reduced by one time by a residual backbone network based on the improved Mask RCNN model to obtain three groups of features with different scales, and performing feature fusion on the three groups of features with different scales to obtain output vector graphs C2, C3, C4, C5 and C6 with the features of different scales fused.
In a preferred embodiment, different convolution modules are mainly used for extracting the features of the pictures with different scales. Convolution modules in the feature extraction layer (residual backbone network) are shown in fig. 2, and there are two types of convolution modules, which are respectively called "block 1" and "block 2" for short.
The convolution module block1 workflow comprises:
①, for Branch 1, the output and input remain consistent.
②, for branch 2, sequentially using 1 × 1 convolution kernel, 3 × 3 convolution kernel, 1 × 1 convolution kernel to perform convolution operation, and performing mean normalization on the output feature vector after each convolution is completed, specifically, the number of feature vector channels after each convolution operation is completed is in a proportional relationship of 1: 1: 4.
The convolution module block2 workflow comprises:
①, for branch 1, a convolution operation is performed using a 1 x1 convolution kernel followed by mean normalization of the output feature vector.
②, for branch 2, sequentially using 1 × 1 convolution kernel, 3 × 3 convolution kernel, 1 × 1 convolution kernel to perform convolution operation, and performing mean normalization on the output feature vector after each convolution is completed, specifically, the number of feature vector channels after each convolution operation is completed is in a proportional relationship of 1: 1: 4.
In a specific embodiment, it is assumed that pictures are input at x 3, and it is noted that the picture size is unified to the standard size of 1024 x 1024 for the original, that is, x is 1024, x is 512 when the picture size is reduced by one time, and x is 2048 when the picture size is enlarged by one time. For simplicity, the description of the mean normalization operation that must be performed after each convolution operation will be omitted, and the description will be omitted if the padding parameter is 0 and the step number parameter is 1 in the convolution and pooling.
Preprocessing the picture before the residual backbone network starts, specifically: filling the input three pictures with convolution kernels of 7 × 64, performing convolution with step number of 2, and outputting
Figure BDA0002396690390000141
The feature vector of (2); is followed byUsing 3 x 3 pool nucleus, step number 2 to make maximum pool, outputting
Figure BDA0002396690390000142
The feature vector of (2).
The C2 layer sequentially comprises a block2, a block1, a block1 and a block2, and the output is
Figure BDA0002396690390000143
The C3 layer sequentially comprises a block2, a block1, a block1 and a block1, and the output is
Figure BDA0002396690390000144
The C4 layer sequentially comprises a block2, a block1, a block1, a block1 and a block1, and the output is
Figure BDA0002396690390000145
In the C5 layer, sequentially comprising a block2, a block1 and a block1, the output is
Figure BDA0002396690390000146
In the C6 layer, sequentially comprising a block2, a block1 and a block1, the output is
Figure BDA0002396690390000147
(2) And then, a characteristic pyramid network layer is formed, after the output of the five layers of characteristic vectors in the last step is subjected to 1-by-1 convolution to change the number of channels to be 256, the characteristic vectors which are subjected to addition and fusion output by the nth layer and the (n +1) th layer are adopted as the output of the Pn characteristic layer, and the characteristic vectors are directly output for the highest layer P6. In addition, here, to add more candidate detection blocks, the maximum value pooling with step number 2 may be performed on P6 to obtain P7 feature vector output, which is not adopted in this embodiment to save training space.
(3) Candidate detection frames are generated on the P2 to P6 feature maps, the candidate detection frames with three proportions and three length-width ratios are generated by taking each pixel point as the center of each feature map, and the length, the width and the center coordinates of the candidate detection frames are uniformly scaled to the interval of 0 to 1 according to the proportional relation of the sizes of the P2 to P6 feature maps.
(4) Next, at the RPN layer, through a series of convolution operations, the offset and confidence of the candidate detection boxes are generated, combined with the candidate detection boxes, ranked from high to low by the confidence, and the non-maximum suppression algorithm screens out 1500 final proposed detection boxes.
(5) And then, aligning the 1500 recommended detection boxes respectively corresponding to the original P2, P3, P4, P5 and P6 feature maps, namely finding the feature layer to which the recommended detection box belongs according to the size of the recommended detection box, and intercepting the corresponding recommended detection box on the corresponding feature map by using a nonlinear interpolation algorithm. Each signature contains the output of the suggested test box after alignment at 7 x 256.
(6) The output of the proposed detection box alignment layer is re-input into the classifier layer. The classifier layer converts 7 × 7 feature vectors input by 7 × 7 convolution into 1 × 1 feature vectors without changing the number of channels, and converts the feature vectors into 81 classes of category score output and four coordinate position offsets of each class by using full connection; and performing softmax operation on the 81 class score output vectors, and outputting the result as class probability.
(7) And in the testing stage, the category score information and the position offset information obtained by the classifier layer are input into the testing layer, wherein the maximum value of the category probability is selected in the testing layer, the target category corresponding to the prediction of the interested detection frame is selected, and the non-maximum value is further used for inhibiting and filtering redundant interested detection frames by using a threshold value with the size of 0.7. And finally, obtaining a final predicted detection frame and a corresponding predicted target class in a test layer.
In the training stage, in the loss function calculation layer, the loss of the real information is calculated by utilizing the category score information and the position offset information obtained by the classifier layer, and the model parameters and the weight are modified through back propagation.
The method is specifically applied as follows:
step one, obtaining a picture containing a large number of small target objects, modifying the picture into a standard size, respectively performing up-sampling and down-sampling for one time, and forming an image pyramid with an original image to be used as input of an improved Mask RCNN model.
And step two, inputting the image pyramid into a residual backbone network of the improved Mask RCNN model, extracting the features of the three different-scale images in the image pyramid, and outputting three groups of feature vectors.
And step three, fusing the three groups of feature vectors through a feature pyramid network layer to construct a feature pyramid.
And step four, generating a candidate detection frame in each layer of the characteristic pyramid and sending the candidate detection frame into the RPN layer.
And step five, correspondingly generating confidence coefficient and position offset information for each candidate detection frame in the RPN layer, and screening a certain amount of effective suggested detection frames according to the confidence coefficient high-low ordering and non-maximum value inhibition.
And step six, correspondingly returning the screened suggestion detection frames to the feature graphs generated in the feature pyramid, aligning and intercepting the feature graphs, uniformly adjusting the feature graphs to be in a fixed size, and still separately placing the feature graphs according to different feature layers.
And step seven, inputting the aligned suggested detection frame into a classifier layer to obtain confidence degrees of all classes and position offset information of the suggested detection frame. Determining the category of the suggested detection frame according to the maximum category probability; then, considering the offset of the suggested detection frame as the offset corresponding to the maximum value of the category, and adjusting the position of the suggested detection frame; removing objects belonging to the background in each suggested detection frame, then taking a threshold value of 0.7 according to the confidence score of the maximum category of each suggested detection frame, then screening out a certain ROI (Region of interest) and performing non-maximum suppression.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. A small target detection method based on multi-scale images and weighted fusion loss is characterized in that the method is realized based on an improved Mask RCNN model and comprises the following steps:
s1, building an improved Mask RCNN model; the improved Mask RCNN model comprises: the system comprises a residual backbone network, a characteristic pyramid network layer, an area generation network layer, an interested frame alignment layer, a classifier layer, a loss function calculation layer and a test layer;
s2, constructing an image pyramid: carrying out scaling processing on the original image, and forming an image pyramid by the original image, the image with the reduced size and the image with the enlarged size;
s3, randomly cutting the image in the image pyramid;
s4, sending the randomly cut images into a residual backbone network for convolution, batch normalization and pooling, and outputting a plurality of groups of feature maps with different sizes;
s5, fusing a plurality of groups of feature maps with different scales, and further processing to obtain feature maps P2-P6;
s6, generating candidate detection frames which are not screened for the feature maps P2-P6 respectively;
s7, inputting the feature map P2-P6 into an area to generate a network layer, and obtaining the offset and confidence of the candidate detection frame through a series of convolution operations;
s8, combining the offset of the candidate detection frame in the S7 and the data of the candidate detection frame which is not screened and obtained in the S6, and screening the candidate detection frame with a set amount as the interested detection frame;
s9, respectively corresponding the interested detection frames to the feature maps P2-P6, and carrying out alignment operation;
s10, inputting the result of the alignment operation into a classification layer, and outputting the predicted category score, the category probability and the coordinate offset of the interested detection frame;
s11, inputting the predicted interested detection frame category score, the category probability and the coordinate offset into a test layer, screening the maximum value of the category probability in the test layer, selecting the predicted target category corresponding to the interested detection frame, further inhibiting and filtering redundant interested detection frames through the non-maximum value, and finally obtaining the finally predicted interested detection frame and the corresponding predicted target category in the test layer.
2. The small object detection method according to claim 1, further comprising, in a training phase:
s12, inputting the classification scores of the interested detection boxes predicted in S10 into a loss function calculation layer, and taking the classification scores and the actual classification labels as the input of a cross entropy function to calculate a classification loss value so as to obtain the classification prediction loss of the feature maps P2-P6;
the coordinate offset of the interested detection frame predicted in the S10 and the offset of the real target frame are taken as the input of a regression loss function, so that the regression prediction loss of the characteristic map P2-P6 is obtained;
s13, weighting the category prediction losses of the feature map P2 and the feature map P3 respectively, and adding the category prediction losses of the feature map P4, the feature map P5 and the feature map P6 to obtain a total category prediction loss;
weighting the regression prediction losses of the feature map P2 and the feature map P3 respectively, and adding the weighted regression prediction losses of the feature map P4, the feature map P5 and the feature map P6 to obtain a total regression prediction loss;
and S14, iteratively updating parameters and weights of the improved Mask RCNN model through back propagation, specifically, respectively utilizing total class prediction loss and total regression prediction loss, and performing optimization iteration and updating the weight values of the improved Mask RCNN model.
3. The small object detection method according to claim 1, wherein the improvement of the improved Mask RCNN model comprises:
①, aligning the interested detection frames, not aligning uniformly, but aligning different feature layers separately, after aligning, not directly fusing the transmitted loss function calculation layers, but inputting the input loss function calculation layers to carry out classification and regression respectively, finally inputting the input loss function calculation layers separately, weighting the loss functions calculated by the small target feature layer, and fusing the weighted loss functions with the loss functions of the large and medium target layers;
②, adding an effective characteristic layer P6 in the original Mask RCNN model;
③, removing the image segmentation module in the original Mask RCNN and canceling the Mask branch.
4. The small object detection method according to claim 1, wherein the scaling of the original image in S2 includes:
the formula for scaling pictures is expressed as:
Image_New=Image*scale (1)
wherein: image _ New represents a zoomed picture, Image represents a picture before zooming, and scale represents a zooming scale;
the scale is determined by the following factors:
if the length of the minimum edge after the scaling is finished cannot be smaller than min _ dim, min () represents the minimum value operation, h represents the height of the original image, w represents the width of the original image, when min _ dim is larger than min (h, w),
scale=min_dim/min(h,w) (2)
otherwise scale is 1;
if the length of the longest edge after the scaling is finished is max _ dim, if the picture is scaled according to equation (2), and if the longest edge of the scaled picture exceeds max _ dim, the following steps are performed:
scale=max_dim/image_max (3)
otherwise, continuously scaling according to scale min _ dim/min (h, w);
the size of the final zoomed picture is max _ dim × max _ dim, and in addition, if the scale of the final zooming is larger than 1, the original picture is magnified by a bilinear interpolation method; for the part of the picture after the last scaling which is less than max _ dim, zero values are used to fill in the pixel values.
5. The small object detection method according to claim 1, wherein the formula for randomly cropping the picture in S3 is expressed as follows:
Y1=randi([0,image_size(1)-crop_size(1)]) (4)
X1=randi([0,image_size(2)-crop_size(2)]) (5)
wherein: y1 and X1 represent the lower left-hand ordinate and lower left-hand abscissa, respectively, at which cropping of the picture begins; randi represents random access, and the access range is the range inside the small brackets; image _ size is the size of the picture before clipping, the width of the first dimension storing the picture, and the length of the second dimension storing the picture; crop _ size is the size of the area to be cut, the width of the first-dimension storage area and the length of the second-dimension storage area;
Y2=min(image_size(1),Y1+crop_size(1)) (6)
X2=min(image_size(2),X1+crop_size(2)) (7)
wherein: y2 and X2 respectively represent the ordinate and abscissa of the upper right corner at the start of cropping; randi represents random access; min () represents taking the minimum value;
the specific position of the cropping is determined by using the two coordinates obtained by the formulas (4) and (7), and if the cropping area overflows from the original image, pad filling is carried out to obtain the cropped image.
6. The small-target detection method of claim 1, wherein the convolution of the residual backbone network comprises two types of convolution modules, block1 and block2, wherein:
the convolution module block1 workflow comprises:
①, for Branch 1, the output and input remain consistent;
②, for branch 2, sequentially using 1 × 1 convolution kernel, 3 × 3 convolution kernel and 1 × 1 convolution kernel to perform convolution operation, and performing mean value normalization on the output feature vector after each convolution is completed;
the convolution module block2 workflow comprises:
①, for branch 1, performing convolution operation by using 1 × 1 convolution kernel, and then performing mean value normalization on the output feature vector;
②, for branch 2, convolution operation is performed by using 1 × 1 convolution kernel, 3 × 3 convolution kernel and 1 × 1 convolution kernel in sequence, and mean normalization is performed on the output feature vector after each convolution is completed.
7. The small object detection method according to claim 1, wherein outputting a plurality of sets of feature maps of different sizes in S4 includes:
for original input, five-layer feature map output is constructed: c2, C3, C4, C5, C6; for inputs that are doubled relative to the original, five-layer feature map outputs are constructed: c2s, C3s, C4s, C5s, C6 s; for an input that is enlarged by one time relative to the original, five-layer feature map outputs are constructed: c2l, C3l, C4l, C5l, C6 l.
8. The small object detection method according to claim 7, wherein step S5 includes:
s51, performing interpolation principle-based upsampling on C2S-C6S, and doubling C2S-C6S;
s52, performing maximum pooling on C2l-C6l, and doubling C2l-C6 l;
s53, adding C2-C6, C2S-C6S which is enlarged by one time and C2l-C6l which is reduced by one time to obtain C2-C6 which is fused with image features of different scales;
s54, further processing the C2-C6 fused with the image features of different scales to obtain feature maps P2-P6.
9. The small object detection method according to claim 8, wherein S54 includes:
using 256 convolution checks of 1 × 1 to convolve C6 fused with image features of different scales to obtain a feature map P6 with the output of 16 × 256;
c5 fused with image features of different scales is convoluted by 256 convolution checks of 1 × 1, the convoluted result is added with the output obtained by twice upsampling P6, and then the convolution is carried out by 3 × 3 to obtain a feature map P5 with the output of 32 × 256;
c4 fused with image features of different scales is convoluted by 256 convolution checks of 1 × 1, the convoluted result is added with the output obtained by twice upsampling the P5, and then the convolution is carried out by 3 × 3 to obtain a feature map P4 with the output of 64 × 256;
c3 fused with image features of different scales is convoluted by 256 convolution checks of 1 × 1, the convoluted result is added with the output obtained by twice upsampling the P4, and then the convolution is carried out by 3 × 3 to obtain a feature map P3 with the output of 128 × 256;
c2 fused with image features of different scales is convolved by 256 convolution checks of 1 × 1, then added to the output obtained by up-sampling twice the P3, and further convolved by 3 × 3 to obtain a feature map P2 with an output of 256 × 256.
10. The small object detection method according to claim 1, wherein the height h and width w of the candidate detection frame are:
Figure FDA0002396690380000051
Figure FDA0002396690380000052
wherein: scale _ length indicates that when the candidate detection frame is a frame having a height equal to a width, the height and the width correspond to the pixel level size of the original image.
CN202010134062.7A 2020-03-02 2020-03-02 Small target detection method based on multi-scale image and weighted fusion loss Active CN111461110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010134062.7A CN111461110B (en) 2020-03-02 2020-03-02 Small target detection method based on multi-scale image and weighted fusion loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010134062.7A CN111461110B (en) 2020-03-02 2020-03-02 Small target detection method based on multi-scale image and weighted fusion loss

Publications (2)

Publication Number Publication Date
CN111461110A true CN111461110A (en) 2020-07-28
CN111461110B CN111461110B (en) 2023-04-28

Family

ID=71682457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010134062.7A Active CN111461110B (en) 2020-03-02 2020-03-02 Small target detection method based on multi-scale image and weighted fusion loss

Country Status (1)

Country Link
CN (1) CN111461110B (en)

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016467A (en) * 2020-08-28 2020-12-01 展讯通信(上海)有限公司 Traffic sign recognition model training method, recognition method, system, device and medium
CN112052787A (en) * 2020-09-03 2020-12-08 腾讯科技(深圳)有限公司 Target detection method and device based on artificial intelligence and electronic equipment
CN112132206A (en) * 2020-09-18 2020-12-25 青岛商汤科技有限公司 Image recognition method, training method of related model, related device and equipment
CN112215179A (en) * 2020-10-19 2021-01-12 平安国际智慧城市科技股份有限公司 In-vehicle face recognition method, device, apparatus and storage medium
CN112257809A (en) * 2020-11-02 2021-01-22 浙江大华技术股份有限公司 Target detection network optimization method and device, storage medium and electronic equipment
CN112307976A (en) * 2020-10-30 2021-02-02 北京百度网讯科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN112381030A (en) * 2020-11-24 2021-02-19 东方红卫星移动通信有限公司 Satellite optical remote sensing image target detection method based on feature fusion
CN112419310A (en) * 2020-12-08 2021-02-26 中国电子科技集团公司第二十研究所 Target detection method based on intersection and fusion frame optimization
CN112418108A (en) * 2020-11-25 2021-02-26 西北工业大学深圳研究院 Remote sensing image multi-class target detection method based on sample reweighing
CN112508863A (en) * 2020-11-20 2021-03-16 华南理工大学 Target detection method based on RGB image and MSR image dual channels
CN112634313A (en) * 2021-01-08 2021-04-09 云从科技集团股份有限公司 Target occlusion assessment method, system, medium and device
CN112841154A (en) * 2020-12-29 2021-05-28 长沙湘丰智能装备股份有限公司 Disease and pest control system based on artificial intelligence
CN112949520A (en) * 2021-03-10 2021-06-11 华东师范大学 Aerial photography vehicle detection method and detection system based on multi-scale small samples
CN112950703A (en) * 2021-03-11 2021-06-11 江苏禹空间科技有限公司 Small target detection method and device, storage medium and equipment
CN113160141A (en) * 2021-03-24 2021-07-23 华南理工大学 Steel sheet surface defect detecting system
CN113326734A (en) * 2021-04-28 2021-08-31 南京大学 Rotary target detection method based on YOLOv5
CN113408429A (en) * 2021-06-22 2021-09-17 深圳市华汉伟业科技有限公司 Target detection method and system with rotation adaptability
CN113469100A (en) * 2021-07-13 2021-10-01 北京航科威视光电信息技术有限公司 Method, device, equipment and medium for detecting target under complex background
CN113538331A (en) * 2021-05-13 2021-10-22 中国地质大学(武汉) Metal surface damage target detection and identification method, device, equipment and storage medium
CN113628250A (en) * 2021-08-27 2021-11-09 北京澎思科技有限公司 Target tracking method and device, electronic equipment and readable storage medium
CN113657214A (en) * 2021-07-30 2021-11-16 哈尔滨工业大学 Mask RCNN-based building damage assessment method
CN113657174A (en) * 2021-07-21 2021-11-16 北京中科慧眼科技有限公司 Vehicle pseudo-3D information detection method and device and automatic driving system
CN113705387A (en) * 2021-08-13 2021-11-26 国网江苏省电力有限公司电力科学研究院 Method for detecting and tracking interferent for removing foreign matters on overhead line by laser
CN113743521A (en) * 2021-09-10 2021-12-03 中国科学院软件研究所 Target detection method based on multi-scale context sensing
CN113870254A (en) * 2021-11-30 2021-12-31 中国科学院自动化研究所 Target object detection method and device, electronic equipment and storage medium
CN113963274A (en) * 2021-12-22 2022-01-21 中国人民解放军96901部队 Satellite image target intelligent identification system and method based on improved SSD algorithm
CN114067110A (en) * 2021-07-13 2022-02-18 广东国地规划科技股份有限公司 Method for generating instance segmentation network model
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
WO2022227770A1 (en) * 2021-04-28 2022-11-03 北京百度网讯科技有限公司 Method for training target object detection model, target object detection method, and device
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
JP2023527615A (en) * 2021-04-28 2023-06-30 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Target object detection model training method, target object detection method, device, electronic device, storage medium and computer program
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
CN116645523A (en) * 2023-07-24 2023-08-25 济南大学 Rapid target detection method based on improved RetinaNet
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124415A1 (en) * 2015-11-04 2017-05-04 Nec Laboratories America, Inc. Subcategory-aware convolutional neural networks for object detection
CN110348445A (en) * 2019-06-06 2019-10-18 华中科技大学 A kind of example dividing method merging empty convolution sum marginal information
CN110738642A (en) * 2019-10-08 2020-01-31 福建船政交通职业学院 Mask R-CNN-based reinforced concrete crack identification and measurement method and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124415A1 (en) * 2015-11-04 2017-05-04 Nec Laboratories America, Inc. Subcategory-aware convolutional neural networks for object detection
CN110348445A (en) * 2019-06-06 2019-10-18 华中科技大学 A kind of example dividing method merging empty convolution sum marginal information
CN110738642A (en) * 2019-10-08 2020-01-31 福建船政交通职业学院 Mask R-CNN-based reinforced concrete crack identification and measurement method and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIAXIANG LUO ET AL.: "A Fast Circle Detection Method Based on a Tri-Class Thresholding for High Detail FPC Images", 《IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT》 *

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11797304B2 (en) 2018-02-01 2023-10-24 Tesla, Inc. Instruction set architecture for a vector computational unit
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11908171B2 (en) 2018-12-04 2024-02-20 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
CN112016467A (en) * 2020-08-28 2020-12-01 展讯通信(上海)有限公司 Traffic sign recognition model training method, recognition method, system, device and medium
CN112016467B (en) * 2020-08-28 2022-09-20 展讯通信(上海)有限公司 Traffic sign recognition model training method, recognition method, system, device and medium
CN112052787A (en) * 2020-09-03 2020-12-08 腾讯科技(深圳)有限公司 Target detection method and device based on artificial intelligence and electronic equipment
CN112132206A (en) * 2020-09-18 2020-12-25 青岛商汤科技有限公司 Image recognition method, training method of related model, related device and equipment
CN112215179A (en) * 2020-10-19 2021-01-12 平安国际智慧城市科技股份有限公司 In-vehicle face recognition method, device, apparatus and storage medium
CN112215179B (en) * 2020-10-19 2024-04-19 平安国际智慧城市科技股份有限公司 In-vehicle face recognition method, device, apparatus and storage medium
CN112307976A (en) * 2020-10-30 2021-02-02 北京百度网讯科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN112257809A (en) * 2020-11-02 2021-01-22 浙江大华技术股份有限公司 Target detection network optimization method and device, storage medium and electronic equipment
CN112508863B (en) * 2020-11-20 2023-07-18 华南理工大学 Target detection method based on RGB image and MSR image double channels
CN112508863A (en) * 2020-11-20 2021-03-16 华南理工大学 Target detection method based on RGB image and MSR image dual channels
CN112381030B (en) * 2020-11-24 2023-06-20 东方红卫星移动通信有限公司 Satellite optical remote sensing image target detection method based on feature fusion
CN112381030A (en) * 2020-11-24 2021-02-19 东方红卫星移动通信有限公司 Satellite optical remote sensing image target detection method based on feature fusion
CN112418108A (en) * 2020-11-25 2021-02-26 西北工业大学深圳研究院 Remote sensing image multi-class target detection method based on sample reweighing
CN112419310B (en) * 2020-12-08 2023-07-07 中国电子科技集团公司第二十研究所 Target detection method based on cross fusion frame optimization
CN112419310A (en) * 2020-12-08 2021-02-26 中国电子科技集团公司第二十研究所 Target detection method based on intersection and fusion frame optimization
CN112841154A (en) * 2020-12-29 2021-05-28 长沙湘丰智能装备股份有限公司 Disease and pest control system based on artificial intelligence
CN112634313B (en) * 2021-01-08 2021-10-29 云从科技集团股份有限公司 Target occlusion assessment method, system, medium and device
CN112634313A (en) * 2021-01-08 2021-04-09 云从科技集团股份有限公司 Target occlusion assessment method, system, medium and device
CN112949520A (en) * 2021-03-10 2021-06-11 华东师范大学 Aerial photography vehicle detection method and detection system based on multi-scale small samples
CN112949520B (en) * 2021-03-10 2022-07-26 华东师范大学 Aerial photography vehicle detection method and detection system based on multi-scale small samples
CN112950703A (en) * 2021-03-11 2021-06-11 江苏禹空间科技有限公司 Small target detection method and device, storage medium and equipment
CN112950703B (en) * 2021-03-11 2024-01-19 无锡禹空间智能科技有限公司 Small target detection method, device, storage medium and equipment
CN113160141A (en) * 2021-03-24 2021-07-23 华南理工大学 Steel sheet surface defect detecting system
CN113326734B (en) * 2021-04-28 2023-11-24 南京大学 Rotational target detection method based on YOLOv5
WO2022227770A1 (en) * 2021-04-28 2022-11-03 北京百度网讯科技有限公司 Method for training target object detection model, target object detection method, and device
CN113326734A (en) * 2021-04-28 2021-08-31 南京大学 Rotary target detection method based on YOLOv5
JP2023527615A (en) * 2021-04-28 2023-06-30 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Target object detection model training method, target object detection method, device, electronic device, storage medium and computer program
CN113538331A (en) * 2021-05-13 2021-10-22 中国地质大学(武汉) Metal surface damage target detection and identification method, device, equipment and storage medium
CN113408429B (en) * 2021-06-22 2023-06-09 深圳市华汉伟业科技有限公司 Target detection method and system with rotation adaptability
CN113408429A (en) * 2021-06-22 2021-09-17 深圳市华汉伟业科技有限公司 Target detection method and system with rotation adaptability
CN114067110A (en) * 2021-07-13 2022-02-18 广东国地规划科技股份有限公司 Method for generating instance segmentation network model
CN113469100A (en) * 2021-07-13 2021-10-01 北京航科威视光电信息技术有限公司 Method, device, equipment and medium for detecting target under complex background
CN113657174A (en) * 2021-07-21 2021-11-16 北京中科慧眼科技有限公司 Vehicle pseudo-3D information detection method and device and automatic driving system
CN113657214B (en) * 2021-07-30 2024-04-02 哈尔滨工业大学 Building damage assessment method based on Mask RCNN
CN113657214A (en) * 2021-07-30 2021-11-16 哈尔滨工业大学 Mask RCNN-based building damage assessment method
CN113705387B (en) * 2021-08-13 2023-11-17 国网江苏省电力有限公司电力科学研究院 Interference object detection and tracking method for removing overhead line foreign matters by laser
CN113705387A (en) * 2021-08-13 2021-11-26 国网江苏省电力有限公司电力科学研究院 Method for detecting and tracking interferent for removing foreign matters on overhead line by laser
CN113628250A (en) * 2021-08-27 2021-11-09 北京澎思科技有限公司 Target tracking method and device, electronic equipment and readable storage medium
CN113743521B (en) * 2021-09-10 2023-06-27 中国科学院软件研究所 Target detection method based on multi-scale context awareness
CN113743521A (en) * 2021-09-10 2021-12-03 中国科学院软件研究所 Target detection method based on multi-scale context sensing
CN113870254A (en) * 2021-11-30 2021-12-31 中国科学院自动化研究所 Target object detection method and device, electronic equipment and storage medium
CN113963274A (en) * 2021-12-22 2022-01-21 中国人民解放军96901部队 Satellite image target intelligent identification system and method based on improved SSD algorithm
CN113963274B (en) * 2021-12-22 2022-03-04 中国人民解放军96901部队 Satellite image target intelligent identification system and method based on improved SSD algorithm
CN116645523A (en) * 2023-07-24 2023-08-25 济南大学 Rapid target detection method based on improved RetinaNet
CN116645523B (en) * 2023-07-24 2023-12-01 江西蓝瑞存储科技有限公司 Rapid target detection method based on improved RetinaNet

Also Published As

Publication number Publication date
CN111461110B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN111461110A (en) Small target detection method based on multi-scale image and weighted fusion loss
CN111639692B (en) Shadow detection method based on attention mechanism
EP3540637B1 (en) Neural network model training method, device and storage medium for image processing
WO2020221013A1 (en) Image processing method and apparaus, and electronic device and storage medium
CN110570371B (en) Image defogging method based on multi-scale residual error learning
CN111126472A (en) Improved target detection method based on SSD
JP2022173399A (en) Image processing apparatus, and image processing method
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN111898406B (en) Face detection method based on focus loss and multitask cascade
CN111583097A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN111898668A (en) Small target object detection method based on deep learning
CN111027382B (en) Attention mechanism-based lightweight face detection method and model
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN113762409A (en) Unmanned aerial vehicle target detection method based on event camera
CN111860683B (en) Target detection method based on feature fusion
CN115131797B (en) Scene text detection method based on feature enhancement pyramid network
CN112927209B (en) CNN-based significance detection system and method
CN110599455A (en) Display screen defect detection network model, method and device, electronic equipment and storage medium
CN113313810A (en) 6D attitude parameter calculation method for transparent object
CN114170526A (en) Remote sensing image multi-scale target detection and identification method based on lightweight network
CN113393434A (en) RGB-D significance detection method based on asymmetric double-current network architecture
CN112613442A (en) Video sequence emotion recognition method based on principle angle detection and optical flow conversion
CN116758340A (en) Small target detection method based on super-resolution feature pyramid and attention mechanism
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant