CN110503112B - Small target detection and identification method for enhancing feature learning - Google Patents

Small target detection and identification method for enhancing feature learning Download PDF

Info

Publication number
CN110503112B
CN110503112B CN201910794606.XA CN201910794606A CN110503112B CN 110503112 B CN110503112 B CN 110503112B CN 201910794606 A CN201910794606 A CN 201910794606A CN 110503112 B CN110503112 B CN 110503112B
Authority
CN
China
Prior art keywords
small target
convolution
layer
module
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910794606.XA
Other languages
Chinese (zh)
Other versions
CN110503112A (en
Inventor
程建
林莉
李�灿
周晓晔
李月男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910794606.XA priority Critical patent/CN110503112B/en
Publication of CN110503112A publication Critical patent/CN110503112A/en
Application granted granted Critical
Publication of CN110503112B publication Critical patent/CN110503112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small target detection and identification method for enhancing feature learning, belongs to the field of image processing, pattern identification and computer vision, and solves the problems of low detection precision and low network efficiency of small target detection and identification tasks in the prior art. The method comprises the steps of sequentially constructing a basic network module, a feature extraction module, a candidate frame generation module and a prediction output module as a small target detection and identification network; based on the extracted small target sample image data, preprocessing the extracted small target sample image data; inputting the obtained preprocessed small target sample image data into a small target detection and recognition network with initialized parameters for training to obtain a trained small target detection and recognition network; and inputting the small target image to be predicted into the trained small target detection and identification network, and outputting the position and the class information of the prediction frame of the small target end to end through forward propagation. The method is used for small target detection and identification.

Description

Small target detection and identification method for enhancing feature learning
Technical Field
A small target detection and identification method for enhancing feature learning is used for small target detection and identification and belongs to the field of image processing, pattern identification and computer vision.
Background
The task of target detection and identification is still one of the popular research directions in the field of computer vision so far, and due to the wide engineering application, the task is rapidly developed and innovated in the field of academic research. In fact, the target detection and recognition task plays an important role in life, for example, the safety inspection application in important public transportation places such as airports, railway stations and the like based on face recognition under the target detection and recognition task; the vehicle license plate detection and recognition based on the target detection and recognition task has important practical significance for traffic regulation and driving safety detection.
The target detection and identification task is different from the ordinary classification task in that the traditional classification task only needs to output a single class result, namely the probability that the input picture belongs to a certain class. Therefore, when there are multiple objects of interest in a picture to be examined, a simple classification task is not sufficient to meet this requirement. In contrast, the object detection and recognition task can locate the position of the object of interest through the detection network and make a judgment on the category of the object of interest. For the small target detection and identification, which is a subtask of target detection and identification, the small target detection and identification task is not greatly improved in recent years because the learning capability of the neural network for the small target features is insufficient.
Most of the traditional target detection and identification methods are based on an anchor mechanism method, namely, a plurality of prediction frames are designed on a feature map, the prediction frames are compared with real frames, one prediction frame closest to the real frame is selected as a prediction frame of a network by utilizing a certain pre-designed evaluation standard, and the category of the prediction frame is predicted. With the development of deep learning, the target detection and identification task is gradually developed and innovated in the aspect of improving performance, and the current detection and identification task is mainly divided into two different solutions: 1) A target detection and identification method of a two-stage task; 2) Provided is a target detection and identification method of a single-stage task. Specifically, if the target detection and recognition task is decomposed into two independent subtasks: the detection and classification process is called the two-stage task target detection and identification. Similarly, if the target detection and identification task is realized on an end-to-end basis, the method is called a single-stage task target detection and identification method. Whether the method is a single-stage method or a two-stage method, the detection precision of the small target object is low (note: the small target object refers to a target with a pixel point area smaller than 32x32 in an image). The reason is mainly due to the following two aspects: 1) The characteristics learned by the neural network have insufficient characteristic capability on the characteristics of a small target object; 2) The traditional target detection algorithm based on the anchor mechanism is to calculate the IOU (namely the size of the overlapping area) between a prediction frame and a real frame and then reject a smaller area to be selected of the IOU by using a preset threshold. However, for small target detection, the area of the small target mapped on the original image of any one feature layer is generally small, and by using the judgment criterion, a large probability of missing detection phenomenon occurs for a small target detection task, and the network efficiency is low (i.e. the detection speed is slow).
Disclosure of Invention
In view of the above research problems, an object of the present invention is to provide a small target detection and identification method for enhancing feature learning, which solves the problems of low detection precision and low network efficiency (i.e., low detection speed) of small target detection and identification tasks in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a small target detection and identification method for enhancing feature learning comprises the following steps:
s1, sequentially constructing a basic network module for extracting the features of a small target and outputting a preliminary feature map, a feature extraction module formed by two hourglass stacks for further extracting the features and outputting the feature map on the basis of the preliminary feature map, a candidate frame generation module for generating a candidate frame on the basis of the feature map, and a prediction output module for performing prediction frame coordinate regression and prediction frame class classification on the basis of the candidate frame, wherein the prediction output module is used as a small target detection and identification network, namely a deep neural network, and randomly initializing the parameters of the small target detection and identification network after construction;
s2, extracting small target sample image data based on the COCO data set, namely extracting small target sample image data with pixel point area of less than 32x32, and preprocessing the extracted small target sample image data; inputting the obtained preprocessed small target sample image data into a small target detection and recognition network with initialized parameters for training to obtain a trained small target detection and recognition network;
and S3, inputting the small target image to be predicted into the trained small target detection and identification network, and outputting the position and the class information of the prediction frame of the small target end to end through forward propagation.
Further, the basic network module in the step S1 is improved ResNet-101 or improved VGG16.
Further, the improved ResNet-101 sequentially includes an input layer, a first group of convolutional layers, a maximum pooling layer, a second group of convolutional layers, a third group of convolutional layers, a fourth group of convolutional layers, and a fifth group of convolutional layers, wherein the input size of the input layer is 513x513 image data; the first group of convolutional layers sequentially comprises two parts of 1 7x7 convolutional operation and 1 nonlinear activation function operation, the second group of convolutional layers sequentially comprises 9 convolutional layers, a nonlinear activation layer and an average pooling layer, the third group of convolutional layers sequentially comprises 12 convolutional layers, a nonlinear activation layer and an average pooling layer, the fourth group of convolutional layers sequentially comprises 69 convolutional layers, a nonlinear activation layer and an average pooling layer, and the fifth group of convolutional layers sequentially comprises 9 convolutional layers, a nonlinear activation layer and an average pooling layer, wherein each convolutional layer in the second group of convolutional layers to the fifth group of convolutional layers sequentially passes through 1x1 convolution, 13x 3 convolutional layers and 1x1 convolutional operation.
Further, the modified VGG16 sequentially includes a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, and a fifth convolutional layer, where the first convolutional layer and the second convolutional layer sequentially include convolution operations in which 2 convolution kernels are 3x3 and nonlinear activation function operations, and the third convolutional layer, the fourth convolutional layer, and the fifth convolutional layer sequentially include convolution operations in which 3 convolution kernels are 3x3 and nonlinear activation function operations.
Further, the feature extraction module in the step S1 is immediately connected to the basic network module, a single hourglass stack in the feature extraction module is composed of 3-order sampling units and is in an hourglass shape, and each-order sampling unit comprises a convolution module and an identity mapping module; the convolution module sequentially comprises 1 down-sampling layer, 3 convolution layers and 1 up-sampling layer, wherein the second convolution layer in the convolution module in the second-order sampling unit is the convolution module in the first-order sampling unit, the second convolution layer in the convolution module in the third-order sampling unit is the convolution module in the second-order sampling unit, each convolution layer is a convolution layer with the size of 3x3, and the down-sampling ratio in the down-sampling layer is
Figure BDA0002180417160000031
Wherein d represents the d-th branch in the convolution module, and the upsampling in the upsampling layer adopts a bilinear interpolation method; the identity mapping module is used for carrying out jump connection on the input of a down sampling layer of the convolution module and the output of an upper sampling layer, and is used for learning the detail information of the shallow feature in the deep network.
Further, the candidate frame generation module generates 9 candidate frames with different sizes at each pixel point position of the feature map output by the feature extraction module based on an anchor generation mechanism, and each candidate frame is mapped onto the original image and corresponds to one candidate frame.
Further, in the step S1, the prediction output module follows the candidate frame generation module, where performing prediction frame coordinate regression is to regress a prediction frame coordinate position through a Smooth L1 loss function, the prediction frame category classification is to obtain category information of a corresponding prediction frame through a softmax loss function after regressing a numerical value through wx + b based on a feature map corresponding to the prediction frame, and x is a pixel point value in the feature map, where the prediction frame is a candidate frame; the method comprises the following specific steps:
the coordinates of the central point of the prediction frame are x and y, the width and the height are w and h respectively, if the position of the central point of any prediction frame and the height and width information are x a ,y a ,w a ,h a The position of the center point of the real frame is x t ,y t ,w t ,h t And the central point position of the prediction frame is as follows: x, y, w, h, and g = (g) the offset between the real frame and the candidate frame x ,g y ,g w ,g h ) The concrete solving formula is:
Figure BDA0002180417160000032
Figure BDA0002180417160000033
the actual offset obtained is l = (l) x ,l y ,l w ,l h ) The concrete solving formula is:
Figure BDA0002180417160000034
Figure BDA0002180417160000035
and (3) regressing the coordinate position of the prediction frame by using a Smooth L1 loss function, wherein the corresponding solving formula is as follows:
Figure BDA0002180417160000036
in the formula, i represents all positive sample sets, i.e. any one of the prediction box sets;
wherein the Smooth L1 loss function is:
Figure BDA0002180417160000041
after a numerical value is regressed through wx + b based on the feature graph corresponding to the prediction frame, the category information of the corresponding prediction frame is obtained through a softmax loss function, wherein the softmax loss function is as follows:
Figure BDA0002180417160000042
where c is a predicted box label, i.e., a predicted class, k is a true box label, i.e., a true box class of the small-target sample image data, L cls The classification loss function alpha is a hyperparameter which can be automatically adjusted in an experiment, and the classification loss function comprises plants, televisions, ships and chairs.
Further, in step S1, the randomly initializing the parameters of the small target detection and identification network refers to pre-training the small target detection and identification network by using larger public data to obtain a set of initialized parameters, where the larger public data is lmageNet.
Further, when the small target detection and recognition network training is performed in step S2, a central point discrimination module block is added, which is used for obtaining a candidate frame predicted by the prediction output module based on the candidate frame generated by the candidate frame generation module, the k-nearest neighbor method and the non-maximum suppression method, and specifically includes the following steps: according to the central point position of the real frame of the small target sample image data, determining candidate frames corresponding to k nearest central point positions around the small target sample image data by using a k nearest neighbor method, taking the candidate frames as preliminarily selected candidate frames, processing the candidate frames by using the k nearest neighbor method, and obtaining the optimal candidate frame by using a non-maximum suppression method.
Further, in the step S2, the preprocessing of the extracted small target sample image data refers to performing plus/minus 90-degree rotation, random clipping or scale scaling on the small target sample image data;
the specific implementation process of obtaining the trained small target detection and identification network is as follows: inputting the small target sample image data obtained by preprocessing into a small target detection and identification network with initialized parameters, carrying out forward propagation to obtain regression and classification results, solving the loss of the regression and classification results according to the small target sample image data, updating network parameters of the small target detection and identification network by utilizing the loss reverse propagation, and obtaining the trained small target detection and identification network after an iteration condition is met.
Compared with the prior art, the invention has the beneficial effects that:
1. for the small target detection network based on the feature extraction module formed by two hourglass stacks, semantic information and detail information can be effectively fused together in the feature extraction module in a pyramid fusion-like mode, so that the feature expression capacity of the small target is enhanced, the problem of low detection precision of the small target is solved, and a plurality of redundant candidate frames can be removed according to the central point discrimination module, so that the frame calculation in the subsequent steps is reduced; it can be prevented that small candidate blocks are lost as negative sample pairs in that part of the NMS.
2. The method introduces a mode of judging according to the geometric distance of the central point of the real frame, and efficiently selects the positive sample frame (prediction frame) with the most probable occurrence of the target, thereby reducing the burden of the redundant frame on network calculation.
3. The invention focuses more on the frames around the real target, thereby reducing the calculation of the frames far away from the real position, and improving the detection speed.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a network architecture diagram of a feature base module and a prediction output module formed by two hourglass stacks in accordance with the present invention;
FIG. 3 is a schematic diagram of a feature extraction module of the present invention;
fig. 4 is a schematic diagram of a center point discrimination prediction frame obtained according to the present invention.
FIG. 5 is a graph comparing the effect of a feature basis module formed using SSD, DSSD, 1 hourglass stack and 2 hourglass stacks on a basis of the underlying network in accordance with the present invention; wherein, one-hour glass represents a characteristic basic module formed by only containing 1 hourglass stack; two-hourglass represents a feature basic module formed by 2 hourglass stacks, AP represents average precision, subscripts S, M and L represent small-scale, medium-scale and large-scale targets respectively, SSD represents that a feature fusion module is adopted to predict shallow and deep features, and DSSD represents that Two feature fusion modules are adopted to perform fusion on the shallow and deep features in a deconvolution mode to predict.
Fig. 6 is a schematic diagram showing the detection results of the SSD and the DSSD according to the present invention and the prior art in an embodiment of the present invention, wherein (a) shows a SSD network experiment result, b) shows a DSSD network experiment result, and (c) shows an experiment result of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments.
In order to solve the problems, the invention adopts two stacked hourglass to form a feature extraction module, enhances the semantic information of shallow features, simultaneously fuses the shallow features and deep features, and enhances the detailed features of the deep features, thereby enhancing the feature description of the network on small target objects and improving the precision of small target detection. On the other hand, the evaluation criterion of the original 1OU is improved, and a prediction frame closest to the real frame is screened by using a central point centralized evaluation mode. And finally, respectively regressing the position information and the category information of the prediction frame by using a prediction discrimination module.
As shown in fig. 1, a small target detection and identification method for enhancing feature learning includes the following steps:
s1, sequentially constructing a basic network module for extracting the features of a small target and outputting a preliminary feature map, a feature extraction module formed by two hourglass stacks for further extracting the features and outputting the feature map on the basis of the preliminary feature map, a candidate frame generation module for generating a candidate frame on the basis of the feature map, and a prediction output module for performing prediction frame coordinate regression and prediction frame class classification on the basis of the candidate frame, wherein the prediction output module is used as a small target detection and identification network, namely a deep neural network, and randomly initializing the parameters of the small target detection and identification network after construction;
the basic network module in the step S1 is improved ResNet-101 or improved VGG16.
As shown in fig. 2, the modified ResNet-101 sequentially includes an input layer, a first set of convolutional layers, a maximum pooling layer, a second set of convolutional layers, a third set of convolutional layers, a fourth set of convolutional layers, and a fifth set of convolutional layers, wherein the input size of the input layer is 513x513 image data; the first group of convolutional layers sequentially comprises two parts of 1 7x7 convolution operation and 1 nonlinear activation function operation, the second group of convolutional layers sequentially comprises 9 convolutional layers, a nonlinear activation layer and an average pooling layer, the third group of convolutional layers sequentially comprises 12 convolutional layers, a nonlinear activation layer and an average pooling layer, the fourth group of convolutional layers sequentially comprises 69 convolutional layers, a nonlinear activation layer and an average pooling layer, and the fifth group of convolutional layers sequentially comprises 9 convolutional layers, a nonlinear activation layer and an average pooling layer, wherein each convolutional layer in the second group of convolutional layers to the fifth group of convolutional layers sequentially passes through 1x1 convolution, 1x 3 convolutional layer and 1x1 convolution operation.
The improved VGG16 comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a fifth convolution layer in sequence, wherein the first convolution layer and the second convolution layer sequentially comprise 2 convolution operations with convolution kernels of 3x3 and nonlinear activation function operations, and the third convolution layer, the fourth convolution layer and the fifth convolution layer sequentially comprise 3 convolution operations with convolution kernels of 3x3 and nonlinear activation function operations.
As shown in fig. 3, the feature extraction module is next to the basic network module, a single hourglass stack in the feature extraction module is composed of 3-order sampling units, and is in an hourglass shape, and each-order sampling unit comprises a convolution module and an identity mapping module; the convolution module sequentially comprises 1 down-sampling layer, 3 convolution layers and 1 up-sampling layer, a second convolution layer in the convolution module in the second-order sampling unit is a convolution module in the first-order sampling unit, a second convolution layer in the convolution module in the third-order sampling unit is a convolution module in the second-order sampling unit, and each convolution layer is a convolution layer of 3x3 and a down-sampling layer in the down-sampling layerSample ratio of
Figure BDA0002180417160000061
Wherein d represents the d-th branch in the convolution module, and the upsampling in the upsampling layer adopts a bilinear interpolation method; the identity mapping module is used for carrying out jump connection on the input of a down sampling layer of the convolution module and the output of an upper sampling layer, and is used for learning the detail information of the shallow feature in the deep network.
The candidate frame generation module generates 9 candidate frames with different sizes at each pixel point position of the feature image output by the feature extraction module based on an anchor generation mechanism, and each candidate frame is mapped to the original image and corresponds to one candidate frame.
The prediction output module is next to the candidate frame generation module, wherein the prediction frame coordinate regression is performed by regressing the coordinate position of the prediction frame through a Smooth L1 loss function, the classification of the category of the prediction frame refers to that the category information of the corresponding prediction frame is obtained through a softmax loss function after a numerical value is regressed through wx + b based on the feature map corresponding to the prediction frame, x refers to the pixel point value in the feature map, and the prediction frame refers to the candidate frame; the method comprises the following specific steps:
the coordinates of the central point of the prediction frame are x and y, the width and the height are w and h respectively, if the position of the central point of any prediction frame and the height and width information are x a ,y a ,w a ,h a The position of the center point of the real frame is x t ,y t ,w t ,h t The central point of the prediction frame is as follows: x, y, w, h, and g = (g) the offset between the real frame and the candidate frame x ,g y ,g w ,g h ) The specific solving formula is:
Figure BDA0002180417160000071
Figure BDA0002180417160000072
the actual offset obtained is l = (l) x ,l y ,l w ,l h ) The concrete solving formula is:
Figure BDA0002180417160000073
Figure BDA0002180417160000074
and (3) regressing the coordinate position of the prediction frame by using a Smooth L1 loss function, wherein the corresponding solving formula is as follows:
Figure BDA0002180417160000075
in the formula, i represents all positive sample sets, i.e. any one of the prediction box sets;
wherein the Smooth L1 loss function is:
Figure BDA0002180417160000076
after a numerical value is regressed through wx + b based on the feature graph corresponding to the prediction frame, the category information of the corresponding prediction frame is obtained through a softmax loss function, wherein the softmax loss function is as follows:
Figure BDA0002180417160000077
wherein c is a prediction label, i.e. a prediction class, k is a true label, i.e. a true class, alpha is a hyper-parameter, which can be automatically adjusted in an experiment, L cls To classify the loss function, the categories include plants, televisions, boats, chairs, etc.
Randomly initializing parameters of the small target detection and identification network refers to pre-training the small target detection and identification network by using larger public data to obtain a group of initialized parameters, wherein the larger public data is lmageNet.
S2, extracting small target sample image data based on the COCO data set, wherein the COCO data set accounts for 41% of the small target sample image data, namely extracting small target sample image data with pixel point area of less than 32x32, and preprocessing the extracted small target sample image data; inputting the obtained preprocessed small target sample image data into a small target detection and recognition network with initialized parameters for training to obtain a trained small target detection and recognition network;
as shown in fig. 4, when performing small target detection and recognition network training, a central point discriminating module block is added for predicting the candidate frame obtained by the prediction output module based on the candidate frame generated by the candidate frame generating module, the k-nearest neighbor method and the non-maximum suppression method, and the specific steps are as follows: according to the central point position of the real frame of the small target sample image data, determining candidate frames corresponding to k nearest central point positions around the small target sample image data by using a k nearest neighbor method, taking the candidate frames as preliminarily selected candidate frames, processing the candidate frames by using the k nearest neighbor method, and obtaining the best candidate frame by using a non-maximum suppression method. The central point discrimination module can eliminate a plurality of redundant candidate frames so as to reduce the calculation of the frames in the subsequent steps; it can be prevented that small candidate blocks are lost as negative sample pairs in that part of the NMS.
The preprocessing of the extracted small target sample image data refers to the rotation, random cutting or scale scaling operation of plus or minus 90 degrees on the small target sample image data;
the specific implementation process of obtaining the trained small target detection and identification network is as follows: inputting the small target sample image data obtained by preprocessing into a small target detection and identification network with initialized parameters, carrying out forward propagation to obtain regression and classification results, solving the loss of the regression and classification results according to the small target sample image data, updating network parameters of the small target detection and identification network by utilizing the loss backward propagation, and obtaining the trained small target detection and identification network after the iteration condition is reached.
And S3, inputting the small target image to be predicted into the trained small target detection and identification network, and outputting the position and the class information of the prediction frame of the small target end to end through forward propagation. The trained small target detection and recognition network can detect images without real frames, and the method specifically comprises the following steps:
and (3) a testing stage:
1) In the training stage, the weight parameters of the small target detection and identification network are obtained, namely the trained small target detection and identification network is obtained, and the predicted small target image is input;
2) Basic features (such as edge information, color, shape and other information) of the small target image are learned through a basic network module;
3) Learning characteristic diagram information of different scales through a characteristic extraction module to obtain a multi-scale characteristic diagram;
4) After obtaining the multi-scale feature images, generating 9 anchors at each pixel point position of any one feature image, wherein each anchor is mapped to an original image and corresponds to a candidate frame;
5) Performing regression calculation on the coordinates of the candidate box by using the trained weights w and b (wx + b);
6) Obtaining a score value (namely a probability value belonging to any one category) through a softmax function after regression, and using the probability value to perform NMS;
7) And selecting the best box as a final predicted value.
Examples
Extracting small target sample image data from the COCO data set as a test set, and inputting the small target images in the test set into the SSD, the DSSD and the method of the invention respectively for detection to obtain results shown in FIGS. 5 and 6, wherein FIG. 6 shows the detection results of three small target images in the SSD, the DSSD and the method of the invention. The invention has the advantages that the detection precision is higher than that of the prior art in the detection of small targets, medium targets and large targets, and the SSD network and the DSSD network have a great number of missed detection problems in the detection of the small targets, but the model structure provided by the invention is better improved.
The above are merely representative examples of the many specific applications of the present invention, and do not limit the scope of the invention in any way. All the technical solutions formed by the transformation or the equivalent substitution fall within the protection scope of the present invention.

Claims (6)

1. A small target detection and identification method for enhancing feature learning is characterized by comprising the following steps:
s1, sequentially constructing a basic network module for extracting the features of a small target and outputting a preliminary feature map, a feature extraction module formed by two hourglass stacks for further extracting the features and outputting the feature map on the basis of the preliminary feature map, a candidate frame generation module for generating a candidate frame on the basis of the feature map, and a prediction output module for performing prediction frame coordinate regression and prediction frame class classification on the basis of the candidate frame, wherein the prediction output module is used as a small target detection and identification network, namely a deep neural network, and randomly initializing the parameters of the small target detection and identification network after construction;
s2, extracting small target sample image data based on the COCO data set, namely extracting small target sample image data with pixel point area of less than 32x32, and preprocessing the extracted small target sample image data; inputting the obtained preprocessed small target sample image data into a small target detection and recognition network with initialized parameters for training to obtain a trained small target detection and recognition network;
s3, inputting the small target image to be predicted into the trained small target detection and identification network, and outputting the position and the category information of the prediction frame of the small target end to end through forward propagation;
the basic network module in the step S1 is improved ResNet-101 or improved VGG16;
the improved ResNet-101 sequentially comprises an input layer, a first group of convolutional layers, a maximum pooling layer, a second group of convolutional layers, a third group of convolutional layers, a fourth group of convolutional layers and a fifth group of convolutional layers, wherein the input size of the input layer is 513x513 image data; the first group of convolutional layers sequentially comprises two parts of 1 7x7 convolution operation and 1 nonlinear activation function operation, the second group of convolutional layers sequentially comprises 9 convolutional layers, a nonlinear activation layer and an average pooling layer, the third group of convolutional layers sequentially comprises 12 convolutional layers, a nonlinear activation layer and an average pooling layer, the fourth group of convolutional layers sequentially comprises 69 convolutional layers, a nonlinear activation layer and an average pooling layer, and the fifth group of convolutional layers sequentially comprises 9 convolutional layers, a nonlinear activation layer and an average pooling layer, wherein each convolutional layer in the second group of convolutional layers to the fifth group of convolutional layers sequentially passes through 1x1 convolution, 13x 3 convolutional layers and 1x1 convolution operation;
the improved VGG16 sequentially comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer and a fifth convolution layer, wherein the first convolution layer and the second convolution layer sequentially comprise 2 convolution operations with convolution kernels of 3x3 and nonlinear activation function operations, and the third convolution layer, the fourth convolution layer and the fifth convolution layer sequentially comprise 3 convolution operations with convolution kernels of 3x3 and nonlinear activation function operations;
the characteristic extraction module in the step S1 is next to the basic network module, a single hourglass stack in the characteristic extraction module consists of 3-order sampling units and is in an hourglass shape, and each-order sampling unit comprises a convolution module and an identity mapping module; the convolution module sequentially comprises 1 down-sampling layer, 3 convolution layers and 1 up-sampling layer, wherein the second convolution layer in the convolution module in the second-order sampling unit is the convolution module in the first-order sampling unit, the second convolution layer in the convolution module in the third-order sampling unit is the convolution module in the second-order sampling unit, each convolution layer is a convolution layer with the size of 3x3, and the down-sampling ratio in the down-sampling layer is
Figure FDA0003751253310000021
Wherein d represents the d-th branch in the convolution module, and the upsampling in the upsampling layer adopts a bilinear interpolation method; the identity mapping module is used for carrying out jump connection on the input of a down sampling layer of the convolution module and the output of an upper sampling layer and learning the detail information of the shallow feature in a deep network.
2. The method for small target detection and identification with enhanced feature learning of claim 1, wherein: the candidate frame generation module generates 9 candidate frames with different sizes at each pixel point position of the feature image output by the feature extraction module based on an anchor generation mechanism, and each candidate frame is mapped to the original image and corresponds to one candidate frame.
3. The method of claim 2, wherein the method comprises the steps of: in the step S1, the prediction output module follows the candidate frame generation module, where performing prediction frame coordinate regression is to regress the coordinate position of the prediction frame through a Smooth L1 loss function, the prediction frame category classification is to obtain category information of the corresponding prediction frame through a softmax loss function after regressing a numerical value through wx + b based on a feature map corresponding to the prediction frame, and x is a pixel point value in the feature map, where the prediction frame is a candidate frame; the method comprises the following specific steps:
the coordinates of the central point of the prediction frame are x and y, the width and the height are w and h respectively, if the position of the central point of any prediction frame and the height and width information are x a ,y a ,w a ,h a The position of the center point of the real frame is x t ,y t ,w t ,h t And the central point position of the prediction frame is as follows: x, y, w, h, and the offset between the real frame and the candidate frame is set as g = (g) x ,g y ,g w ,g h ) The concrete solving formula is:
Figure FDA0003751253310000022
Figure FDA0003751253310000023
the actual offset obtained is l = (l) x ,l y ,l w ,l h ) Concretely, to findThe solution is:
Figure FDA0003751253310000024
Figure FDA0003751253310000025
and (3) regressing the coordinate position of the prediction frame by using a Smooth L1 loss function, wherein the corresponding solving formula is as follows:
Figure FDA0003751253310000026
in the formula, i represents all positive sample sets, i.e. any one of the prediction box sets;
wherein the Smooth L1 loss function is:
Figure FDA0003751253310000027
obtaining the category information of the corresponding prediction frame through a softmax loss function after a numerical value is regressed through wx + b on the basis of the feature graph corresponding to the prediction frame, wherein the softmax loss function is as follows:
Figure FDA0003751253310000031
where c is a predicted box label, i.e., a predicted class, k is a true box label, i.e., a true box class of the small-target sample image data, L cls The classification loss function alpha is a hyperparameter which can be automatically adjusted in an experiment, and the classification loss function comprises plants, televisions, ships and chairs.
4. The method for small object detection and identification with enhanced feature learning of claim 1, wherein: in the step S1, randomly initializing parameters of the small target detection and recognition network refers to pre-training the small target detection and recognition network by using larger public data to obtain a set of initialized parameters, where the larger public data is ImageNet.
5. The method of claim 2, wherein the method comprises the steps of: when small target detection and recognition network training is performed in the step S2, a central point distinguishing module is added for obtaining a candidate frame predicted by the prediction output module based on the candidate frame generated by the candidate frame generation module, the k-nearest neighbor method and the non-maximum suppression method, and the specific steps are as follows: according to the central point position of the real frame of the small target sample image data, determining candidate frames corresponding to k nearest central point positions around the small target sample image data by using a k nearest neighbor method, taking the candidate frames as preliminarily selected candidate frames, processing the candidate frames by using the k nearest neighbor method, and obtaining the optimal candidate frame by using a non-maximum suppression method.
6. The method for small object detection and identification by enhancing feature learning according to claim 1 or 4, wherein: in the step S2, the preprocessing of the extracted small target sample image data refers to performing plus/minus 90-degree rotation, random cropping or scale scaling operation on the small target sample image data;
the specific implementation process of obtaining the trained small target detection and identification network is as follows: inputting the small target sample image data obtained by preprocessing into a small target detection and identification network with initialized parameters, carrying out forward propagation to obtain regression and classification results, solving the loss of the regression and classification results according to the small target sample image data, updating network parameters of the small target detection and identification network by utilizing the loss backward propagation, and obtaining the trained small target detection and identification network after the iteration condition is reached.
CN201910794606.XA 2019-08-27 2019-08-27 Small target detection and identification method for enhancing feature learning Active CN110503112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910794606.XA CN110503112B (en) 2019-08-27 2019-08-27 Small target detection and identification method for enhancing feature learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910794606.XA CN110503112B (en) 2019-08-27 2019-08-27 Small target detection and identification method for enhancing feature learning

Publications (2)

Publication Number Publication Date
CN110503112A CN110503112A (en) 2019-11-26
CN110503112B true CN110503112B (en) 2023-02-03

Family

ID=68589600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910794606.XA Active CN110503112B (en) 2019-08-27 2019-08-27 Small target detection and identification method for enhancing feature learning

Country Status (1)

Country Link
CN (1) CN110503112B (en)

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929795B (en) * 2019-11-28 2022-09-13 桂林电子科技大学 Method for quickly identifying and positioning welding spot of high-speed wire welding machine
CN111080700A (en) * 2019-12-11 2020-04-28 中国科学院自动化研究所 Medical instrument image detection method and device
CN111091091A (en) * 2019-12-16 2020-05-01 北京迈格威科技有限公司 Method, device and equipment for extracting target object re-identification features and storage medium
CN111160372B (en) * 2019-12-30 2023-04-18 沈阳理工大学 Large target identification method based on high-speed convolutional neural network
CN113128308B (en) * 2020-01-10 2022-05-20 中南大学 Pedestrian detection method, device, equipment and medium in port scene
CN111259763B (en) * 2020-01-13 2024-02-02 华雁智能科技(集团)股份有限公司 Target detection method, target detection device, electronic equipment and readable storage medium
CN111242026B (en) * 2020-01-13 2022-07-12 中国矿业大学 Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN111310756B (en) * 2020-01-20 2023-03-28 陕西师范大学 Damaged corn particle detection and classification method based on deep learning
CN111291796A (en) * 2020-01-21 2020-06-16 中国科学技术大学 Sampling-free method used in target detector model training process
CN111444828B (en) * 2020-03-25 2023-06-20 腾讯科技(深圳)有限公司 Model training method, target detection method, device and storage medium
CN111523403B (en) * 2020-04-03 2023-10-20 咪咕文化科技有限公司 Method and device for acquiring target area in picture and computer readable storage medium
CN111524112B (en) * 2020-04-17 2023-04-07 中冶赛迪信息技术(重庆)有限公司 Steel chasing identification method, system, equipment and medium
CN111583204B (en) * 2020-04-27 2022-10-14 天津大学 Organ positioning method of two-dimensional sequence magnetic resonance image based on network model
CN111597945B (en) * 2020-05-11 2023-08-18 济南博观智能科技有限公司 Target detection method, device, equipment and medium
CN111563462A (en) * 2020-05-11 2020-08-21 广东博智林机器人有限公司 Image element detection method and device
CN111611947B (en) * 2020-05-25 2024-04-09 济南博观智能科技有限公司 License plate detection method, device, equipment and medium
CN111626208B (en) * 2020-05-27 2023-06-13 阿波罗智联(北京)科技有限公司 Method and device for detecting small objects
CN113743163A (en) * 2020-05-29 2021-12-03 中移(上海)信息通信科技有限公司 Traffic target recognition model training method, traffic target positioning method and device
CN111814850A (en) * 2020-06-22 2020-10-23 浙江大华技术股份有限公司 Defect detection model training method, defect detection method and related device
CN111767962B (en) * 2020-07-03 2022-11-08 中国科学院自动化研究所 One-stage target detection method, system and device based on generation countermeasure network
CN111986126B (en) * 2020-07-17 2022-05-24 浙江工业大学 Multi-target detection method based on improved VGG16 network
CN112163530B (en) * 2020-09-30 2024-04-09 江南大学 SSD small target detection method based on feature enhancement and sample selection
CN112215179B (en) * 2020-10-19 2024-04-19 平安国际智慧城市科技股份有限公司 In-vehicle face recognition method, device, apparatus and storage medium
CN112417980A (en) * 2020-10-27 2021-02-26 南京邮电大学 Single-stage underwater biological target detection method based on feature enhancement and refinement
CN112364931B (en) * 2020-11-20 2024-03-19 长沙军民先进技术研究有限公司 Few-sample target detection method and network system based on meta-feature and weight adjustment
CN112308062B (en) * 2020-11-23 2022-08-23 浙江卡易智慧医疗科技有限公司 Medical image access number identification method in complex background image
CN112396126B (en) * 2020-12-02 2023-09-22 中山大学 Target detection method and system based on detection trunk and local feature optimization
CN112699914B (en) * 2020-12-02 2023-09-22 中山大学 Target detection method and system based on heterogeneous composite trunk
CN112561898A (en) * 2020-12-22 2021-03-26 电子科技大学中山学院 Optical fiber sensor light spot analysis method based on convolutional neural network
CN112634174B (en) * 2020-12-31 2023-12-12 上海明略人工智能(集团)有限公司 Image representation learning method and system
CN112926383B (en) * 2021-01-08 2023-03-03 浙江大学 Automatic target identification system based on underwater laser image
CN112819793A (en) * 2021-02-01 2021-05-18 宁波港信息通信有限公司 Container damage identification method, device, equipment and readable access medium
CN113158757B (en) * 2021-02-08 2023-04-07 海信视像科技股份有限公司 Display device and gesture control method
CN112990263B (en) * 2021-02-08 2022-12-06 武汉工程大学 Data enhancement method for high-resolution image of dense small target
CN113033672B (en) * 2021-03-29 2023-07-28 西安电子科技大学 Multi-class optical image rotation target self-adaptive detection method based on feature enhancement
CN113221947A (en) * 2021-04-04 2021-08-06 青岛日日顺乐信云科技有限公司 Industrial quality inspection method and system based on image recognition technology
CN113361322B (en) * 2021-04-23 2022-09-27 山东大学 Power line target detection method and device based on weighted deconvolution layer number improved DSSD algorithm and storage medium
CN113221731B (en) * 2021-05-10 2023-10-27 西安电子科技大学 Multi-scale remote sensing image target detection method and system
CN113159215A (en) * 2021-05-10 2021-07-23 河南理工大学 Small target detection and identification method based on fast Rcnn
CN113128476A (en) * 2021-05-17 2021-07-16 广西师范大学 Low-power consumption real-time helmet detection method based on computer vision target detection
CN113361437A (en) * 2021-06-16 2021-09-07 吉林建筑大学 Method and system for detecting category and position of minimally invasive surgical instrument
CN113920322A (en) * 2021-10-21 2022-01-11 广东工业大学 Modular robot kinematic chain configuration identification method and system
CN113822277B (en) * 2021-11-19 2022-02-18 万商云集(成都)科技股份有限公司 Illegal advertisement picture detection method and system based on deep learning target detection
CN114240844B (en) * 2021-11-23 2023-03-14 电子科技大学 Unsupervised key point positioning and target detection method in medical image
CN114241232B (en) * 2021-11-23 2023-04-18 电子科技大学 Multi-task learning-based camera position identification and body surface anatomical landmark detection method
CN114529951B (en) * 2022-02-22 2024-04-02 北京工业大学 On-site fingerprint feature point extraction method based on deep learning
CN114359742B (en) * 2022-03-21 2022-09-16 济南大学 Weighted loss function calculation method for optimizing small target detection
CN115984846B (en) * 2023-02-06 2023-10-10 山东省人工智能研究院 Intelligent recognition method for small targets in high-resolution image based on deep learning
CN117994251A (en) * 2024-04-03 2024-05-07 华中科技大学同济医学院附属同济医院 Method and system for evaluating severity of diabetic foot ulcer based on artificial intelligence

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000036524A1 (en) * 1998-12-16 2000-06-22 Sarnoff Corporation Method and apparatus for training a neural network to detect objects in an image
CN108805203A (en) * 2018-06-11 2018-11-13 腾讯科技(深圳)有限公司 Image procossing and object recognition methods, device, equipment and storage medium again
CN108960212A (en) * 2018-08-13 2018-12-07 电子科技大学 Based on the detection of human joint points end to end and classification method
CN109117876A (en) * 2018-07-26 2019-01-01 成都快眼科技有限公司 A kind of dense small target deteection model building method, model and detection method
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109784476A (en) * 2019-01-12 2019-05-21 福州大学 A method of improving DSOD network
CN109977812A (en) * 2019-03-12 2019-07-05 南京邮电大学 A kind of Vehicular video object detection method based on deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000036524A1 (en) * 1998-12-16 2000-06-22 Sarnoff Corporation Method and apparatus for training a neural network to detect objects in an image
CN108805203A (en) * 2018-06-11 2018-11-13 腾讯科技(深圳)有限公司 Image procossing and object recognition methods, device, equipment and storage medium again
CN109117876A (en) * 2018-07-26 2019-01-01 成都快眼科技有限公司 A kind of dense small target deteection model building method, model and detection method
CN108960212A (en) * 2018-08-13 2018-12-07 电子科技大学 Based on the detection of human joint points end to end and classification method
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109784476A (en) * 2019-01-12 2019-05-21 福州大学 A method of improving DSOD network
CN109977812A (en) * 2019-03-12 2019-07-05 南京邮电大学 A kind of Vehicular video object detection method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Stacked Hourglass CNN for Handwritten Character Location;H. Clark-Younger et al.;《2018 International Conference on Image and Vision Computing New Zealand (IVCNZ)》;20190207;第1-6页 *
基于深度卷积神经网络的小目标检测;郭之先;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑(月刊),2018年第08期》;20180815;全文 *

Also Published As

Publication number Publication date
CN110503112A (en) 2019-11-26

Similar Documents

Publication Publication Date Title
CN110503112B (en) Small target detection and identification method for enhancing feature learning
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN113610822B (en) Surface defect detection method based on multi-scale information fusion
CN113569667B (en) Inland ship target identification method and system based on lightweight neural network model
CN111079739B (en) Multi-scale attention feature detection method
CN110991444B (en) License plate recognition method and device for complex scene
CN112861635B (en) Fire disaster and smoke real-time detection method based on deep learning
CN114972213A (en) Two-stage mainboard image defect detection and positioning method based on machine vision
CN110009622B (en) Display panel appearance defect detection network and defect detection method thereof
CN111754507A (en) Light-weight industrial defect image classification method based on strong attention machine mechanism
CN115035361A (en) Target detection method and system based on attention mechanism and feature cross fusion
CN116485709A (en) Bridge concrete crack detection method based on YOLOv5 improved algorithm
CN111753682A (en) Hoisting area dynamic monitoring method based on target detection algorithm
CN115620180A (en) Aerial image target detection method based on improved YOLOv5
CN114463759A (en) Lightweight character detection method and device based on anchor-frame-free algorithm
Ma et al. Intelligent detection model based on a fully convolutional neural network for pavement cracks
CN115439694A (en) High-precision point cloud completion method and device based on deep learning
CN116152226A (en) Method for detecting defects of image on inner side of commutator based on fusible feature pyramid
Fu et al. Extended efficient convolutional neural network for concrete crack detection with illustrated merits
CN112837281B (en) Pin defect identification method, device and equipment based on cascade convolution neural network
CN112101113B (en) Lightweight unmanned aerial vehicle image small target detection method
CN117437201A (en) Road crack detection method based on improved YOLOv7
CN117253188A (en) Transformer substation grounding wire state target detection method based on improved YOLOv5
CN115424276B (en) Ship license plate number detection method based on deep learning technology
CN115272819A (en) Small target detection method based on improved Faster-RCNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant