CN111898668A - Small target object detection method based on deep learning - Google Patents

Small target object detection method based on deep learning Download PDF

Info

Publication number
CN111898668A
CN111898668A CN202010723829.XA CN202010723829A CN111898668A CN 111898668 A CN111898668 A CN 111898668A CN 202010723829 A CN202010723829 A CN 202010723829A CN 111898668 A CN111898668 A CN 111898668A
Authority
CN
China
Prior art keywords
image
small target
training
target object
objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010723829.XA
Other languages
Chinese (zh)
Inventor
杨海东
巴姗姗
黄坤山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Foshan Guangdong University CNC Equipment Technology Development Co. Ltd
Original Assignee
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Foshan Guangdong University CNC Equipment Technology Development Co. Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute, Foshan Guangdong University CNC Equipment Technology Development Co. Ltd filed Critical Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Priority to CN202010723829.XA priority Critical patent/CN111898668A/en
Publication of CN111898668A publication Critical patent/CN111898668A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a small target object detection method based on deep learning, which can overcome the problems of insufficient detection efficiency, low accuracy and the like in the existing small target object detection method. Firstly, extracting an image without a small target object based on a COCO data set, splicing after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set; then, modifying the basic feature extraction network of the Faster-RCNN to perform feature fusion; then, selecting a candidate region of each level of fused features after fusion through an RPN; then, training the improved network by using a training set to obtain a training model; and finally, inputting the test set into the trained model for target detection.

Description

Small target object detection method based on deep learning
Technical Field
The invention relates to the technical field of target detection, in particular to a small target object detection method based on deep learning.
Background
The object detection is a basic computer vision task combining two tasks of object positioning and identification, and aims to find a plurality of objects in a complex background of an image, provide an accurate object frame for each object and judge the category of the object in the frame. The target detection technology has wide application in daily life of people, such as target tracking and recognition, face recognition, character detection, pedestrian detection, medical diagnosis, intelligent monitoring systems and the like, and as a basic task, rapid development of target detection in recent years promotes progress of other visual tasks. Although deep learning based methods work well on generic target detection data sets, they still do not solve the problem of small target detection well. The main reason is that there are two problems with small target detection:
(1) the amount of information is insufficient, i.e. the target occupies a very small area in the image, and the amount of information that can be reflected by the pixels of the corresponding area is very limited.
(2) The scarcity of data volume, i.e., few images in the data set containing small objects, results in an unbalanced class for the entire training set. For example, in the COCO dataset, although the approximate proportions of the small object, the medium object, and the large object are 42%, 34%, and 24%, respectively, only about 52% of the images contain the small object, and the proportions of the medium object and the large object are 71% and 83%, respectively. In other words, in some images, most objects are small objects, and only half of the images contain small objects, which severely affects the imbalance during training, resulting in small target objects being detected with much lower accuracy than medium and large objects.
The small targets exist in a small amount in general images, and also exist in images shot by unmanned aerial vehicle cameras, communication base station cameras and other image capturing devices with higher erection heights, and research on small target detection is very important for analyzing and utilizing the images. Accordingly, further improvements and improvements are needed in the art.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a small target object detection method based on deep learning.
The purpose of the invention is realized by the following technical scheme:
a small target object detection method based on deep learning mainly comprises the following specific steps:
step S1: extracting an image without a small target object based on the COCO data set, splicing the image after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: a scale of 1 divides the data set into a training set and a test set.
Step S2: and modifying the basic feature extraction network of the Faster-RCNN to perform feature fusion.
Step S3: and selecting a candidate region through the RPN according to each level of fusion characteristics after fusion.
Step S4: and inputting the training images into an improved network for training, and constructing a loss function according to target classification and regression.
Step S5: and repeatedly selecting the training picture until the loss function is converged and storing the training model.
Step S6: and inputting the test set into the trained model for testing.
Further, in step S1, performing size adjustment on an image that does not include a small target object in the COCO dataset, and splicing the adjusted images; the method aims to solve the problem of unbalance of the sizes of target objects in the training process, the size of the spliced image is the same as that of a conventional image, and large objects and medium objects are reduced into medium objects and small objects in this way, so that the distribution of the objects with different scales in the training process is balanced.
Furthermore, in the image splicing process, scaling k conventional images (with the size of W x H) with uniform resolution by a nearest neighbor interpolation method, and then combining to construct a spliced image; in order to preserve the properties of the original image,scaled image preservation
Figure BDA0002600971870000021
The aspect ratio of (a) is generally, when k is 1, the stitched image introduces a regular image.
Further, the nearest neighbor interpolation process is: firstly, assuming that the size of an original image pixel is W x H, the size of the zoomed image pixel is W x H, and the coordinates of each pixel point in the original image are integers; if one pixel point is (X, Y) after scaling, the corresponding pixel point in the original image is (X, Y) ═ W/W X, H/H Y, but the value in (W/W X, H/H Y) is not necessarily an integer due to scaling, and at this time, the value is rounded to an integer and expressed as g, so that g (X, Y) ═ g (W/W X, H/H Y).
As a preferred embodiment of the present invention, in step S2, the base feature extraction network module of the fast-RCNN is a residual network ResNet-50, which includes input, conv1, max power, conv2_ x, conv3_ x, conv4_ x, conv5_ x, wherein the conv1 layer includes 1 convolution operation, the convolution kernel is 7 × 7 and has a step size of 2, conv2_ x, conv3_ x, conv4_ x, and conv5_ x layers respectively include 3, 4, 6, and 3 residual blocks, each residual block includes 3 layers of convolution, the convolution kernel size is 1 × 1, 3 × 3, and 1 × 1 in sequence, wherein the convolution layer 3 × 3 of the first residual block of the conv3_ x, conv4_ x, and conv5_ x layers has a step size of 2, the resolution is made to be 2, the downsampling depth is made to be reduced, and all the rest are downsampling depth steps are increased.
As a preferred scheme of the present invention, in step S2, the improved fast-RCNN feature extraction network adopts a feature fusion structure with horizontal connection from top to bottom on the basis of the basic feature extraction network; in the basic feature extraction network module, although top-level features have highly abstract semantic information, the top-level features are insensitive to features such as edges or geometric information of objects due to the fact that multiple pooling downsampling operations are performed, and small target objects are more difficult to characterize; it is worth noting that the shallow feature map usually has high resolution and rich geometric detail features, while the top feature map has stronger semantic abstract information and robustness to the posture, position change and the like of an object, but the resolution is lower, so that on the basis of a bottom-up network structure, a top feature fusion structure with horizontal connection from top to bottom is adopted, the resolution is amplified by an up-sampling means for the top feature map, and the deep feature map and the shallow feature map are combined to generate a feature map with high resolution and rich semantics; after fusion feature maps of different levels are obtained through feature fusion, convolution operation with convolution kernel of 3 × 3 is performed on the fusion feature map of each level, so as to remove aliasing effect caused by up-sampling.
Further, by means of upsampling, the spatial size of the deep features is enlarged by means of a bilinear interpolation method; the bilinear interpolation method is to perform secondary one-dimensional linear interpolation, namely, to perform one-dimensional linear interpolation in the u and v directions respectively; suppose that the pixel point of the original image corresponding to the pixel point of the new image is (u)0,v0),u0And v0If not an integer, it must fall into four pixels of the original image, which are (u ', v'), (u ', v' +1), (u '+ 1, v'), (u '+ 1, v' +1), respectively; firstly, two points (u ', v'), (u '+ 1, v') are linearly interpolated in one dimension to obtain g (u0V'); then, one-dimensional linear interpolation is carried out on two points of (u ', v' +1) and (u '+ 1, v' +1) to obtain g (u0V' + 1); finally, the obtained (u) is aligned again0,v′),(u0V' +1) two points are subjected to one-dimensional linear interpolation to obtain g (u)0,v0)。
As a preferred embodiment of the present invention, in step S3, the obtained feature map is input into the RPN network to locate candidate targets, a Softmax two classifier is used to determine whether the obtained candidate targets belong to the foreground or the background, a bounding box regressor is used to correct the positions of the candidate targets, so as to obtain target candidate regions, and finally the final feature map and the candidate regions generated in the RPN network are sent into the Fast-RCNN network, so as to finally realize the classification and regression of the targets.
As a preferable aspect of the present invention, in step S4, a loss function is established based on the target classification and regression, and the classification loss is defined as:
Figure BDA0002600971870000041
for regression of the frame, smooth-L is adopted1The loss, defined as:
Figure BDA0002600971870000042
the loss function is therefore:
Figure BDA0002600971870000043
wherein N isclsFor training the number of anchor frames, p, in the RPN processiAs the probability that the anchor frame is predicted as the target, when the prediction result is a positive sample,
Figure BDA0002600971870000044
otherwise predicting as negative sample
Figure BDA0002600971870000045
NregAs the number of anchor points, tiFor the offset predicted by the RPN training phase,
Figure BDA0002600971870000046
is the offset relative to the real frame; λ is a balance parameter between the two terms.
As a preferred scheme of the present invention, in step S5, when the picture is repeatedly selected for training, the input selection of the next iteration is adaptively determined by using the loss in the current iteration as feedback; in the current iteration t, if the loss of the small object is negligible, namely the small target loss ratio
Figure BDA0002600971870000047
And if the input value is less than a certain threshold value, the input of the iteration t +1 is the spliced image, otherwise, the input is still the conventional image under the default setting.
The working process and principle of the invention are as follows: the invention discloses a small target object detection method based on deep learning, which can overcome the problems of insufficient detection efficiency, low accuracy and the like in the existing small target object detection method. Firstly, extracting an image without a small target object based on a COCO data set, splicing after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set; then, modifying the basic feature extraction network of the Faster-RCNN to perform feature fusion; then, selecting a candidate region of each level of fused features after fusion through an RPN; then, training the improved network by using a training set to obtain a training model; and finally, inputting the test set into the trained model for target detection.
Compared with the prior art, the invention also has the following advantages:
(1) the small target object detection method based on deep learning provided by the invention uses the spliced images, changes some large objects and medium objects into medium objects and small objects, and balances the distribution of the objects with different scales in the training process.
(2) The small target object detection method based on deep learning improves the basic feature extraction network of fast-RCNN, fuses shallow features and deep features to generate a feature map with high resolution and rich semantics, and performs convolution operation on the fused features to remove aliasing effect caused by up-sampling.
(3) The small target object detection method based on deep learning provided by the invention utilizes the loss in the current iteration as feedback in the training process, adaptively determines the input selection of the next iteration, and determines whether the input of the next iteration is a conventional image or a spliced image according to the loss ratio of the small target object, thereby improving the accuracy of small target object detection.
Drawings
Fig. 1 is an overall flowchart of a small target object detection method based on deep learning according to the present invention.
Fig. 2 is a schematic diagram of an improved feature extraction network provided by the present invention.
FIG. 3 is a flow chart of input image class selection in a training process provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described below with reference to the accompanying drawings and examples.
Example 1:
as shown in fig. 1 to 3, the present embodiment discloses a small target object detection method based on deep learning, which includes the following steps:
step 1: extracting an image without a small target object based on the COCO data set, splicing the image after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set;
step 2: modifying a basic feature extraction network of the Faster-RCNN to perform feature fusion;
and step 3: selecting a candidate region of each level of fused features after fusion through an RPN;
and 4, step 4: inputting the training image into an improved network for training, and constructing a loss function according to target classification and regression;
and 5: repeatedly selecting a training picture until the loss function is converged and storing the training model;
step 6: and inputting the test set into the trained model for testing.
In step 1, the size of the image not containing the small target object in the COCO data set is adjusted, and the adjusted images are spliced. The method aims to solve the problem of unbalance of the sizes of target objects in the training process, the size of the spliced image is the same as that of a conventional image, and large objects and medium objects are reduced into medium objects and small objects in this way, so that the distribution of the objects with different scales in the training process is balanced. In conventional images, objects may become blurred due to photographic problems, such as focus blur or motion blur. Although the conventional image is adjusted to a smaller size, the objects with the medium size or larger inside will also become smaller, but the outline or the detail of the conventional image is still clearer than that of the original small objects.
In the image stitching process, k conventional images (with the size of W x H) with uniform resolution are scaled through a nearest neighbor interpolation method and then combined to form a stitched image. To preserve the properties of the original image, the scaled image is kept
Figure BDA0002600971870000061
The aspect ratio of (a) is generally, when k is 1, the stitched image introduces a regular image.
The principle of nearest neighbor interpolation is: firstly, the size of an original image pixel is assumed to be W x H, the size of the zoomed image pixel is assumed to be W x H, and the coordinates of each pixel point in the original image are integers. If one pixel point is (X, Y) after scaling, the corresponding pixel point in the original image is (X, Y) ═ W/W X, H/H Y, but the value in (W/W X, H/H Y) is not necessarily an integer due to scaling, and at this time, the value is rounded to an integer and expressed as g, so that g (X, Y) ═ g (W/W X, H/H Y).
In step 2, the base feature extraction network module of the fast-RCNN is a residual network ResNet-50, which includes input, conv1, max power, conv2_ x, conv3_ x, conv4_ x, conv5_ x, wherein the conv1 layer contains 1 convolution operation, the convolution kernel is 7 × 7, the step size is 2, conv2_ x, conv3_ x, conv4_ x, and conv5_ x layers respectively contain 3, 4, 6, and 3 residual blocks, each residual block contains 3 layers of convolution, the sizes of the convolution kernels are 1 × 1, 3 × 3, and 1 × 1 in sequence, wherein the step size of the first residual block of the conv3_ x, conv4_ x, and conv5_ x layers is 3 × 3, so as to make resolution decrease, and depth increase, and all the rest of downsampling steps are 1.
In step 2, the improved fast-RCNN feature extraction network adopts a feature fusion structure with horizontal connection from top to bottom on the basis of the basic feature extraction network. In the basic feature extraction network module, although the top-level features have highly abstract semantic information, the top-level features are insensitive to features such as edges or geometric information of objects due to the fact that multiple pooling downsampling operations are performed, and therefore small target objects are more difficult to characterize. It is worth noting that the shallow feature map usually has high resolution and rich geometric detail features, while the top feature map has stronger semantic abstract information and robustness to the posture, position change and the like of an object, but the resolution is lower, so that a top-down feature fusion structure with horizontal connection can be adopted on the basis of a bottom-up network structure, the top feature map is amplified in resolution by an up-sampling means, and the deep and shallow feature maps are combined to generate the feature map with high resolution and rich semantics. After the fused feature maps of different levels are obtained through feature fusion, a convolution operation with a convolution kernel of 3 × 3 is performed on the fused feature map of each level, so as to remove an aliasing effect caused by upsampling, and the specific structure of the method is shown in fig. 2.
By upsampling, the deep features are subjected to bilinear interpolation to enlarge the spatial size of the deep features. The bilinear interpolation method is to perform quadratic one-dimensional linear interpolation, that is, to perform one-dimensional linear interpolation in the u and v directions, respectively. Suppose that the pixel point of the original image corresponding to the pixel point of the new image is (u)0,v0)(u0,v0Not an integer), it must fall among four pixel points of the original image, which are (u ', v'), (u ', v' +1), (u '+ 1, v'), (u '+ 1, v' +1), respectively. Firstly, two points (u ', v'), (u '+ 1, v') are linearly interpolated in one dimension to obtain g (u0V ') (1- α) g (u ', v ') + α g (u ' +1, v '); then, one-dimensional linear interpolation is carried out on two points of (u ', v' +1) and (u '+ 1, v' +1) to obtain g (u0V ' +1) ═ g (1- α) g (u ', v ' +1) + α g (u ' +1, v ' + 1); finally, the obtained (u) is aligned again0,v′),(u0V' +1) two points are subjected to one-dimensional linear interpolation to obtain g (u)0,v0)=(1-β)g(u0,v′+1)+βg(u0V' +1), so there are:
g(u0,v0)=(1-α)(1-β)g(u′,v′)+α(1-β)g(u′+1,v′)+β(1-α)g(u′,v′+1)+α·βg(u′+1,v′+1)
wherein
Figure BDA0002600971870000071
In step 3, the feature map obtained in the step 6 is input into the RPN network to perform candidate target positioning, whether the obtained candidate target belongs to the foreground or the background is judged through a Softmax two-classifier, meanwhile, the position of the candidate target is corrected through a bounding box regressor, so that a target candidate region is obtained, finally, the final feature map and the candidate region generated in the RPN network are sent into the Fast-RCNN network, and finally, the classification and regression of the target are realized.
In step 4, a loss function is established based on the target classification and regression, and the classification loss is defined as:
Figure BDA0002600971870000072
for regression of the frame, smooth-L is adopted1The loss, defined as:
Figure BDA0002600971870000073
the loss function is therefore:
Figure BDA0002600971870000074
wherein N isclsFor training the number of anchor frames, p, in the RPN processiAs the probability that the anchor frame is predicted as the target, when the prediction result is a positive sample,
Figure BDA0002600971870000075
otherwise predicting as negative sample
Figure BDA0002600971870000076
NregAs the number of anchor points, tiFor the offset predicted by the RPN training phase,
Figure BDA0002600971870000081
is the offset relative to the real frame; λ is a balance parameter between the two terms.
In step 5, when the picture is repeatedly selected for training, the input selection of the next iteration is adaptively determined by using the loss in the current iteration as feedback, as shown in fig. 3. In the current iteration t, if the loss of the small object is negligible, namely the small target loss ratio
Figure BDA0002600971870000082
And if the input value is less than a certain threshold value, the input of the iteration t +1 is the spliced image, otherwise, the input is still the conventional image under the default setting.
The loss fraction of small objects is calculated as follows:
Figure BDA0002600971870000083
wherein A isoIs the area of a small object, wo,hoIn order to be able to determine its width and height,
Figure BDA0002600971870000084
is shown from area AoNot more than AsOf small objects, here As=1024,LtRepresenting the loss, scale, of all objects from the current image
Figure BDA0002600971870000085
As a potential feedback to guide the learning of the next iteration.
The working process and principle of the invention are as follows: the invention discloses a small target object detection method based on deep learning, which can overcome the problems of insufficient detection efficiency, low accuracy and the like in the existing small target object detection method. Firstly, extracting an image without a small target object based on a COCO data set, splicing after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set; then, modifying the basic feature extraction network of the Faster-RCNN to perform feature fusion; then, selecting a candidate region of each level of fused features after fusion through an RPN; then, training the improved network by using a training set to obtain a training model; and finally, inputting the test set into the trained model for target detection.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. A small target object detection method based on deep learning is characterized by comprising the following steps:
step S1: extracting an image without a small target object based on the COCO data set, splicing the image after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set;
step S2: modifying a basic feature extraction network of the Faster-RCNN to perform feature fusion;
step S3: selecting a candidate region of each level of fused features after fusion through an RPN;
step S4: inputting the training image into an improved network for training, and constructing a loss function according to target classification and regression;
step S5: repeatedly selecting a training picture until the loss function is converged and storing the training model;
step S6: and inputting the test set into the trained model for testing.
2. The method for detecting small target objects based on deep learning of claim 1, wherein in step S1, the images in the COCO data set that do not include small target objects are resized and the resized images are stitched together; the method aims to solve the problem of unbalance of the sizes of target objects in the training process, the size of the spliced image is the same as that of a conventional image, and large objects and medium objects are reduced into medium objects and small objects in this way, so that the distribution of the objects with different scales in the training process is balanced.
3. The method for detecting the small target object based on the deep learning of claim 2 is characterized in that in the image stitching process, k conventional images (with the size of W x H) with uniform resolution are scaled through a nearest neighbor interpolation method and then combined to form a stitched image; to preserve the properties of the original image, the scaled image is kept
Figure FDA0002600971860000011
The aspect ratio of (a) is generally, when k is 1, the stitched image introduces a regular image.
4. The deep learning-based small target object detection method according to claim 3, wherein the nearest neighbor interpolation process is: firstly, assuming that the size of an original image pixel is W x H, the size of the zoomed image pixel is W x H, and the coordinates of each pixel point in the original image are integers; if one pixel point is (X, Y) after scaling, the corresponding pixel point in the original image is (X, Y) ═ W/W X, H/H Y, but the value in (W/W X, H/H Y) is not necessarily an integer due to scaling, and at this time, the value is rounded to an integer and expressed as g, so that g (X, Y) ═ g (W/W X, H/H Y).
5. The deep learning-based small target object detection method according to claim 1, in step S2, the base feature extraction network module of the fast-RCNN is a residual network ResNet-50, which includes input, conv1, maxporoling, conv2_ x, conv3_ x, conv4_ x, conv5_ x, wherein, the conv1 layer comprises 1 convolution operation, the convolution kernel is 7 × 7, the step size is 2, the conv2_ x, conv3_ x, conv4_ x and conv5_ x layers respectively comprise 3, 4, 6 and 3 residual blocks, each residual block comprises 3 layers of convolution, the convolution kernel size is 1 × 1, 3 × 3 and 1 × 1 in sequence, where the step size of the 3 x 3 convolutional layer of the first residual block of the conv3_ x, conv4_ x, conv5_ x layers is 2 in order to downsample to reduce resolution while increasing depth, and the step size of all the remaining convolutional layers is 1.
6. The method for detecting small target objects based on deep learning of claim 1, wherein in step S2, the improved fast-RCNN feature extraction network is based on its basic feature extraction network, and adopts a top-down feature fusion structure with horizontal connection; in the basic feature extraction network module, although top-level features have highly abstract semantic information, the top-level features are insensitive to features such as edges or geometric information of objects due to the fact that multiple pooling downsampling operations are performed, and small target objects are more difficult to characterize; it is worth noting that the shallow feature map usually has high resolution and rich geometric detail features, while the top feature map has stronger semantic abstract information and robustness to the posture, position change and the like of an object, but the resolution is lower, so that on the basis of a bottom-up network structure, a top feature fusion structure with horizontal connection from top to bottom is adopted, the resolution is amplified by an up-sampling means for the top feature map, and the deep feature map and the shallow feature map are combined to generate a feature map with high resolution and rich semantics; after fusion feature maps of different levels are obtained through feature fusion, convolution operation with convolution kernel of 3 × 3 is performed on the fusion feature map of each level, so as to remove aliasing effect caused by up-sampling.
7. The method for detecting the small target object based on the deep learning as claimed in claim 6, wherein the deep features are subjected to bilinear interpolation to enlarge the spatial size of the deep features by upsampling; the bilinear interpolation method is to make twoLinear interpolation of the second dimension, namely, one-dimensional linear interpolation is respectively carried out in the directions of u and v; suppose that the pixel point of the original image corresponding to the pixel point of the new image is (u)0,v0),u0And v0If not an integer, it must fall into four pixels of the original image, which are (u ', v'), (u ', v' +1), (u '+ 1, v'), (u '+ 1, v' +1), respectively; firstly, two points (u ', v'), (u '+ 1, v') are linearly interpolated in one dimension to obtain g (u0V'); then, one-dimensional linear interpolation is carried out on two points of (u ', v' +1) and (u '+ 1, v' +1) to obtain g (u0V' + 1); finally, the obtained (u) is aligned again0,v′),(u0V' +1) two points are subjected to one-dimensional linear interpolation to obtain g (u)0,v0)。
8. The method for detecting small target objects based on deep learning of claim 1, wherein in step S3, the obtained feature map is inputted into the RPN network for locating candidate targets, a Softmax two classifier is used to determine whether the target belongs to the foreground or the background for the obtained candidate targets, meanwhile, a bounding box regressor is used to correct the positions of the candidate targets, so as to obtain target candidate regions, and finally, the final feature map and the candidate regions generated in the RPN network are sent to the Fast-RCNN network, so as to finally realize the classification and regression of the target.
9. The deep learning-based small target object detection method according to claim 1, wherein in step S4, a loss function is established according to target classification and regression, and the classification loss is defined as:
Figure FDA0002600971860000031
for regression of the frame, smooth-L is adopted1The loss, defined as:
Figure FDA0002600971860000032
the loss function is therefore:
Figure FDA0002600971860000033
wherein N isclsFor training the number of anchor frames, p, in the RPN processiAs the probability that the anchor frame is predicted as the target, when the prediction result is a positive sample,
Figure FDA0002600971860000034
otherwise predicting as negative sample
Figure FDA0002600971860000035
NregAs the number of anchor points, tiFor the offset predicted by the RPN training phase,
Figure FDA0002600971860000036
is the offset relative to the real frame; λ is a balance parameter between the two terms.
10. The method for detecting small target objects based on deep learning of claim 1, wherein in step S5, when the picture is repeatedly selected for training, the input selection of the next iteration is adaptively determined by using the loss in the current iteration as feedback; in the current iteration t, if the loss of the small object is negligible, namely the small target loss ratio
Figure FDA0002600971860000037
And if the input value is less than a certain threshold value, the input of the iteration t +1 is the spliced image, otherwise, the input is still the conventional image under the default setting.
CN202010723829.XA 2020-07-24 2020-07-24 Small target object detection method based on deep learning Pending CN111898668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010723829.XA CN111898668A (en) 2020-07-24 2020-07-24 Small target object detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010723829.XA CN111898668A (en) 2020-07-24 2020-07-24 Small target object detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN111898668A true CN111898668A (en) 2020-11-06

Family

ID=73189910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010723829.XA Pending CN111898668A (en) 2020-07-24 2020-07-24 Small target object detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN111898668A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712500A (en) * 2020-12-28 2021-04-27 同济大学 Remote sensing image target extraction method based on deep neural network
CN112819008A (en) * 2021-01-11 2021-05-18 腾讯科技(深圳)有限公司 Method, device, medium and electronic equipment for optimizing instance detection network
CN112926637A (en) * 2021-02-08 2021-06-08 天津职业技术师范大学(中国职业培训指导教师进修中心) Method for generating text detection training set
CN112949520A (en) * 2021-03-10 2021-06-11 华东师范大学 Aerial photography vehicle detection method and detection system based on multi-scale small samples
CN112950481A (en) * 2021-04-22 2021-06-11 上海大学 Water bloom shielding image data collection method based on image mosaic network
CN113128564A (en) * 2021-03-23 2021-07-16 武汉泰沃滋信息技术有限公司 Typical target detection method and system based on deep learning under complex background
CN113487551A (en) * 2021-06-30 2021-10-08 佛山市南海区广工大数控装备协同创新研究院 Gasket detection method and device for improving performance of dense target based on deep learning
CN113628250A (en) * 2021-08-27 2021-11-09 北京澎思科技有限公司 Target tracking method and device, electronic equipment and readable storage medium
CN113869361A (en) * 2021-08-20 2021-12-31 深延科技(北京)有限公司 Model training method, target detection method and related device
CN113902024A (en) * 2021-10-20 2022-01-07 浙江大立科技股份有限公司 Small-volume target detection and identification method based on deep learning and dual-band fusion
CN116912604A (en) * 2023-09-12 2023-10-20 浙江大华技术股份有限公司 Model training method, image recognition device and computer storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341517A (en) * 2017-07-07 2017-11-10 哈尔滨工业大学 The multiple dimensioned wisp detection method of Fusion Features between a kind of level based on deep learning
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN110930310A (en) * 2019-12-09 2020-03-27 中国科学技术大学 Panoramic image splicing method
CN111144398A (en) * 2018-11-02 2020-05-12 银河水滴科技(北京)有限公司 Target detection method, target detection device, computer equipment and storage medium
CN112148812A (en) * 2019-06-26 2020-12-29 丰图科技(深圳)有限公司 Method, device and equipment for extracting road center line and storage medium thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341517A (en) * 2017-07-07 2017-11-10 哈尔滨工业大学 The multiple dimensioned wisp detection method of Fusion Features between a kind of level based on deep learning
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN111144398A (en) * 2018-11-02 2020-05-12 银河水滴科技(北京)有限公司 Target detection method, target detection device, computer equipment and storage medium
CN112148812A (en) * 2019-06-26 2020-12-29 丰图科技(深圳)有限公司 Method, device and equipment for extracting road center line and storage medium thereof
CN110930310A (en) * 2019-12-09 2020-03-27 中国科学技术大学 Panoramic image splicing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUKANG CHEN等: ""Stitcher: Feedback-driven Data Provider for Object Detection"", 《ARXIV》, pages 1 - 7 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712500A (en) * 2020-12-28 2021-04-27 同济大学 Remote sensing image target extraction method based on deep neural network
CN112819008A (en) * 2021-01-11 2021-05-18 腾讯科技(深圳)有限公司 Method, device, medium and electronic equipment for optimizing instance detection network
CN112926637A (en) * 2021-02-08 2021-06-08 天津职业技术师范大学(中国职业培训指导教师进修中心) Method for generating text detection training set
CN112949520B (en) * 2021-03-10 2022-07-26 华东师范大学 Aerial photography vehicle detection method and detection system based on multi-scale small samples
CN112949520A (en) * 2021-03-10 2021-06-11 华东师范大学 Aerial photography vehicle detection method and detection system based on multi-scale small samples
CN113128564A (en) * 2021-03-23 2021-07-16 武汉泰沃滋信息技术有限公司 Typical target detection method and system based on deep learning under complex background
CN112950481A (en) * 2021-04-22 2021-06-11 上海大学 Water bloom shielding image data collection method based on image mosaic network
CN113487551A (en) * 2021-06-30 2021-10-08 佛山市南海区广工大数控装备协同创新研究院 Gasket detection method and device for improving performance of dense target based on deep learning
CN113487551B (en) * 2021-06-30 2024-01-16 佛山市南海区广工大数控装备协同创新研究院 Gasket detection method and device for improving dense target performance based on deep learning
CN113869361A (en) * 2021-08-20 2021-12-31 深延科技(北京)有限公司 Model training method, target detection method and related device
CN113628250A (en) * 2021-08-27 2021-11-09 北京澎思科技有限公司 Target tracking method and device, electronic equipment and readable storage medium
CN113902024A (en) * 2021-10-20 2022-01-07 浙江大立科技股份有限公司 Small-volume target detection and identification method based on deep learning and dual-band fusion
CN113902024B (en) * 2021-10-20 2024-06-04 浙江大立科技股份有限公司 Small-volume target detection and identification method based on deep learning and dual-band fusion
CN116912604A (en) * 2023-09-12 2023-10-20 浙江大华技术股份有限公司 Model training method, image recognition device and computer storage medium
CN116912604B (en) * 2023-09-12 2024-01-16 浙江大华技术股份有限公司 Model training method, image recognition device and computer storage medium

Similar Documents

Publication Publication Date Title
CN111898668A (en) Small target object detection method based on deep learning
CN111461110B (en) Small target detection method based on multi-scale image and weighted fusion loss
CN112396002B (en) SE-YOLOv 3-based lightweight remote sensing target detection method
CN111126472A (en) Improved target detection method based on SSD
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN110287826B (en) Video target detection method based on attention mechanism
CN114202672A (en) Small target detection method based on attention mechanism
CN109816012A (en) A kind of multiscale target detection method of integrating context information
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN113591795B (en) Lightweight face detection method and system based on mixed attention characteristic pyramid structure
CN109583483A (en) A kind of object detection method and system based on convolutional neural networks
CN112800964A (en) Remote sensing image target detection method and system based on multi-module fusion
CN111274981B (en) Target detection network construction method and device and target detection method
CN115131797B (en) Scene text detection method based on feature enhancement pyramid network
CN112464912B (en) Robot end face detection method based on YOLO-RGGNet
CN116645592B (en) Crack detection method based on image processing and storage medium
CN112800955A (en) Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN111767962A (en) One-stage target detection method, system and device based on generation countermeasure network
CN117152484B (en) Small target cloth flaw detection method based on improved YOLOv5s
CN114529773A (en) Form identification method, system, terminal and medium based on structural unit
EP4352692A1 (en) Volumetric sampling with correlative characterization for dense estimation
CN113496480A (en) Method for detecting weld image defects
CN114463503A (en) Fusion method and device of three-dimensional model and geographic information system
CN116740375A (en) Image feature extraction method, system and medium
CN116403127A (en) Unmanned aerial vehicle aerial image target detection method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination