CN111898668A - Small target object detection method based on deep learning - Google Patents
Small target object detection method based on deep learning Download PDFInfo
- Publication number
- CN111898668A CN111898668A CN202010723829.XA CN202010723829A CN111898668A CN 111898668 A CN111898668 A CN 111898668A CN 202010723829 A CN202010723829 A CN 202010723829A CN 111898668 A CN111898668 A CN 111898668A
- Authority
- CN
- China
- Prior art keywords
- image
- small target
- training
- target object
- objects
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 37
- 238000013135 deep learning Methods 0.000 title claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 47
- 230000004927 fusion Effects 0.000 claims abstract description 26
- 238000000605 extraction Methods 0.000 claims abstract description 20
- 238000012360 testing method Methods 0.000 claims abstract description 15
- 238000000034 method Methods 0.000 claims description 37
- 230000008569 process Effects 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 12
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a small target object detection method based on deep learning, which can overcome the problems of insufficient detection efficiency, low accuracy and the like in the existing small target object detection method. Firstly, extracting an image without a small target object based on a COCO data set, splicing after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set; then, modifying the basic feature extraction network of the Faster-RCNN to perform feature fusion; then, selecting a candidate region of each level of fused features after fusion through an RPN; then, training the improved network by using a training set to obtain a training model; and finally, inputting the test set into the trained model for target detection.
Description
Technical Field
The invention relates to the technical field of target detection, in particular to a small target object detection method based on deep learning.
Background
The object detection is a basic computer vision task combining two tasks of object positioning and identification, and aims to find a plurality of objects in a complex background of an image, provide an accurate object frame for each object and judge the category of the object in the frame. The target detection technology has wide application in daily life of people, such as target tracking and recognition, face recognition, character detection, pedestrian detection, medical diagnosis, intelligent monitoring systems and the like, and as a basic task, rapid development of target detection in recent years promotes progress of other visual tasks. Although deep learning based methods work well on generic target detection data sets, they still do not solve the problem of small target detection well. The main reason is that there are two problems with small target detection:
(1) the amount of information is insufficient, i.e. the target occupies a very small area in the image, and the amount of information that can be reflected by the pixels of the corresponding area is very limited.
(2) The scarcity of data volume, i.e., few images in the data set containing small objects, results in an unbalanced class for the entire training set. For example, in the COCO dataset, although the approximate proportions of the small object, the medium object, and the large object are 42%, 34%, and 24%, respectively, only about 52% of the images contain the small object, and the proportions of the medium object and the large object are 71% and 83%, respectively. In other words, in some images, most objects are small objects, and only half of the images contain small objects, which severely affects the imbalance during training, resulting in small target objects being detected with much lower accuracy than medium and large objects.
The small targets exist in a small amount in general images, and also exist in images shot by unmanned aerial vehicle cameras, communication base station cameras and other image capturing devices with higher erection heights, and research on small target detection is very important for analyzing and utilizing the images. Accordingly, further improvements and improvements are needed in the art.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a small target object detection method based on deep learning.
The purpose of the invention is realized by the following technical scheme:
a small target object detection method based on deep learning mainly comprises the following specific steps:
step S1: extracting an image without a small target object based on the COCO data set, splicing the image after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: a scale of 1 divides the data set into a training set and a test set.
Step S2: and modifying the basic feature extraction network of the Faster-RCNN to perform feature fusion.
Step S3: and selecting a candidate region through the RPN according to each level of fusion characteristics after fusion.
Step S4: and inputting the training images into an improved network for training, and constructing a loss function according to target classification and regression.
Step S5: and repeatedly selecting the training picture until the loss function is converged and storing the training model.
Step S6: and inputting the test set into the trained model for testing.
Further, in step S1, performing size adjustment on an image that does not include a small target object in the COCO dataset, and splicing the adjusted images; the method aims to solve the problem of unbalance of the sizes of target objects in the training process, the size of the spliced image is the same as that of a conventional image, and large objects and medium objects are reduced into medium objects and small objects in this way, so that the distribution of the objects with different scales in the training process is balanced.
Furthermore, in the image splicing process, scaling k conventional images (with the size of W x H) with uniform resolution by a nearest neighbor interpolation method, and then combining to construct a spliced image; in order to preserve the properties of the original image,scaled image preservationThe aspect ratio of (a) is generally, when k is 1, the stitched image introduces a regular image.
Further, the nearest neighbor interpolation process is: firstly, assuming that the size of an original image pixel is W x H, the size of the zoomed image pixel is W x H, and the coordinates of each pixel point in the original image are integers; if one pixel point is (X, Y) after scaling, the corresponding pixel point in the original image is (X, Y) ═ W/W X, H/H Y, but the value in (W/W X, H/H Y) is not necessarily an integer due to scaling, and at this time, the value is rounded to an integer and expressed as g, so that g (X, Y) ═ g (W/W X, H/H Y).
As a preferred embodiment of the present invention, in step S2, the base feature extraction network module of the fast-RCNN is a residual network ResNet-50, which includes input, conv1, max power, conv2_ x, conv3_ x, conv4_ x, conv5_ x, wherein the conv1 layer includes 1 convolution operation, the convolution kernel is 7 × 7 and has a step size of 2, conv2_ x, conv3_ x, conv4_ x, and conv5_ x layers respectively include 3, 4, 6, and 3 residual blocks, each residual block includes 3 layers of convolution, the convolution kernel size is 1 × 1, 3 × 3, and 1 × 1 in sequence, wherein the convolution layer 3 × 3 of the first residual block of the conv3_ x, conv4_ x, and conv5_ x layers has a step size of 2, the resolution is made to be 2, the downsampling depth is made to be reduced, and all the rest are downsampling depth steps are increased.
As a preferred scheme of the present invention, in step S2, the improved fast-RCNN feature extraction network adopts a feature fusion structure with horizontal connection from top to bottom on the basis of the basic feature extraction network; in the basic feature extraction network module, although top-level features have highly abstract semantic information, the top-level features are insensitive to features such as edges or geometric information of objects due to the fact that multiple pooling downsampling operations are performed, and small target objects are more difficult to characterize; it is worth noting that the shallow feature map usually has high resolution and rich geometric detail features, while the top feature map has stronger semantic abstract information and robustness to the posture, position change and the like of an object, but the resolution is lower, so that on the basis of a bottom-up network structure, a top feature fusion structure with horizontal connection from top to bottom is adopted, the resolution is amplified by an up-sampling means for the top feature map, and the deep feature map and the shallow feature map are combined to generate a feature map with high resolution and rich semantics; after fusion feature maps of different levels are obtained through feature fusion, convolution operation with convolution kernel of 3 × 3 is performed on the fusion feature map of each level, so as to remove aliasing effect caused by up-sampling.
Further, by means of upsampling, the spatial size of the deep features is enlarged by means of a bilinear interpolation method; the bilinear interpolation method is to perform secondary one-dimensional linear interpolation, namely, to perform one-dimensional linear interpolation in the u and v directions respectively; suppose that the pixel point of the original image corresponding to the pixel point of the new image is (u)0,v0),u0And v0If not an integer, it must fall into four pixels of the original image, which are (u ', v'), (u ', v' +1), (u '+ 1, v'), (u '+ 1, v' +1), respectively; firstly, two points (u ', v'), (u '+ 1, v') are linearly interpolated in one dimension to obtain g (u0V'); then, one-dimensional linear interpolation is carried out on two points of (u ', v' +1) and (u '+ 1, v' +1) to obtain g (u0V' + 1); finally, the obtained (u) is aligned again0,v′),(u0V' +1) two points are subjected to one-dimensional linear interpolation to obtain g (u)0,v0)。
As a preferred embodiment of the present invention, in step S3, the obtained feature map is input into the RPN network to locate candidate targets, a Softmax two classifier is used to determine whether the obtained candidate targets belong to the foreground or the background, a bounding box regressor is used to correct the positions of the candidate targets, so as to obtain target candidate regions, and finally the final feature map and the candidate regions generated in the RPN network are sent into the Fast-RCNN network, so as to finally realize the classification and regression of the targets.
As a preferable aspect of the present invention, in step S4, a loss function is established based on the target classification and regression, and the classification loss is defined as:
for regression of the frame, smooth-L is adopted1The loss, defined as:
the loss function is therefore:
wherein N isclsFor training the number of anchor frames, p, in the RPN processiAs the probability that the anchor frame is predicted as the target, when the prediction result is a positive sample,otherwise predicting as negative sampleNregAs the number of anchor points, tiFor the offset predicted by the RPN training phase,is the offset relative to the real frame; λ is a balance parameter between the two terms.
As a preferred scheme of the present invention, in step S5, when the picture is repeatedly selected for training, the input selection of the next iteration is adaptively determined by using the loss in the current iteration as feedback; in the current iteration t, if the loss of the small object is negligible, namely the small target loss ratioAnd if the input value is less than a certain threshold value, the input of the iteration t +1 is the spliced image, otherwise, the input is still the conventional image under the default setting.
The working process and principle of the invention are as follows: the invention discloses a small target object detection method based on deep learning, which can overcome the problems of insufficient detection efficiency, low accuracy and the like in the existing small target object detection method. Firstly, extracting an image without a small target object based on a COCO data set, splicing after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set; then, modifying the basic feature extraction network of the Faster-RCNN to perform feature fusion; then, selecting a candidate region of each level of fused features after fusion through an RPN; then, training the improved network by using a training set to obtain a training model; and finally, inputting the test set into the trained model for target detection.
Compared with the prior art, the invention also has the following advantages:
(1) the small target object detection method based on deep learning provided by the invention uses the spliced images, changes some large objects and medium objects into medium objects and small objects, and balances the distribution of the objects with different scales in the training process.
(2) The small target object detection method based on deep learning improves the basic feature extraction network of fast-RCNN, fuses shallow features and deep features to generate a feature map with high resolution and rich semantics, and performs convolution operation on the fused features to remove aliasing effect caused by up-sampling.
(3) The small target object detection method based on deep learning provided by the invention utilizes the loss in the current iteration as feedback in the training process, adaptively determines the input selection of the next iteration, and determines whether the input of the next iteration is a conventional image or a spliced image according to the loss ratio of the small target object, thereby improving the accuracy of small target object detection.
Drawings
Fig. 1 is an overall flowchart of a small target object detection method based on deep learning according to the present invention.
Fig. 2 is a schematic diagram of an improved feature extraction network provided by the present invention.
FIG. 3 is a flow chart of input image class selection in a training process provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described below with reference to the accompanying drawings and examples.
Example 1:
as shown in fig. 1 to 3, the present embodiment discloses a small target object detection method based on deep learning, which includes the following steps:
step 1: extracting an image without a small target object based on the COCO data set, splicing the image after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set;
step 2: modifying a basic feature extraction network of the Faster-RCNN to perform feature fusion;
and step 3: selecting a candidate region of each level of fused features after fusion through an RPN;
and 4, step 4: inputting the training image into an improved network for training, and constructing a loss function according to target classification and regression;
and 5: repeatedly selecting a training picture until the loss function is converged and storing the training model;
step 6: and inputting the test set into the trained model for testing.
In step 1, the size of the image not containing the small target object in the COCO data set is adjusted, and the adjusted images are spliced. The method aims to solve the problem of unbalance of the sizes of target objects in the training process, the size of the spliced image is the same as that of a conventional image, and large objects and medium objects are reduced into medium objects and small objects in this way, so that the distribution of the objects with different scales in the training process is balanced. In conventional images, objects may become blurred due to photographic problems, such as focus blur or motion blur. Although the conventional image is adjusted to a smaller size, the objects with the medium size or larger inside will also become smaller, but the outline or the detail of the conventional image is still clearer than that of the original small objects.
In the image stitching process, k conventional images (with the size of W x H) with uniform resolution are scaled through a nearest neighbor interpolation method and then combined to form a stitched image. To preserve the properties of the original image, the scaled image is keptThe aspect ratio of (a) is generally, when k is 1, the stitched image introduces a regular image.
The principle of nearest neighbor interpolation is: firstly, the size of an original image pixel is assumed to be W x H, the size of the zoomed image pixel is assumed to be W x H, and the coordinates of each pixel point in the original image are integers. If one pixel point is (X, Y) after scaling, the corresponding pixel point in the original image is (X, Y) ═ W/W X, H/H Y, but the value in (W/W X, H/H Y) is not necessarily an integer due to scaling, and at this time, the value is rounded to an integer and expressed as g, so that g (X, Y) ═ g (W/W X, H/H Y).
In step 2, the base feature extraction network module of the fast-RCNN is a residual network ResNet-50, which includes input, conv1, max power, conv2_ x, conv3_ x, conv4_ x, conv5_ x, wherein the conv1 layer contains 1 convolution operation, the convolution kernel is 7 × 7, the step size is 2, conv2_ x, conv3_ x, conv4_ x, and conv5_ x layers respectively contain 3, 4, 6, and 3 residual blocks, each residual block contains 3 layers of convolution, the sizes of the convolution kernels are 1 × 1, 3 × 3, and 1 × 1 in sequence, wherein the step size of the first residual block of the conv3_ x, conv4_ x, and conv5_ x layers is 3 × 3, so as to make resolution decrease, and depth increase, and all the rest of downsampling steps are 1.
In step 2, the improved fast-RCNN feature extraction network adopts a feature fusion structure with horizontal connection from top to bottom on the basis of the basic feature extraction network. In the basic feature extraction network module, although the top-level features have highly abstract semantic information, the top-level features are insensitive to features such as edges or geometric information of objects due to the fact that multiple pooling downsampling operations are performed, and therefore small target objects are more difficult to characterize. It is worth noting that the shallow feature map usually has high resolution and rich geometric detail features, while the top feature map has stronger semantic abstract information and robustness to the posture, position change and the like of an object, but the resolution is lower, so that a top-down feature fusion structure with horizontal connection can be adopted on the basis of a bottom-up network structure, the top feature map is amplified in resolution by an up-sampling means, and the deep and shallow feature maps are combined to generate the feature map with high resolution and rich semantics. After the fused feature maps of different levels are obtained through feature fusion, a convolution operation with a convolution kernel of 3 × 3 is performed on the fused feature map of each level, so as to remove an aliasing effect caused by upsampling, and the specific structure of the method is shown in fig. 2.
By upsampling, the deep features are subjected to bilinear interpolation to enlarge the spatial size of the deep features. The bilinear interpolation method is to perform quadratic one-dimensional linear interpolation, that is, to perform one-dimensional linear interpolation in the u and v directions, respectively. Suppose that the pixel point of the original image corresponding to the pixel point of the new image is (u)0,v0)(u0,v0Not an integer), it must fall among four pixel points of the original image, which are (u ', v'), (u ', v' +1), (u '+ 1, v'), (u '+ 1, v' +1), respectively. Firstly, two points (u ', v'), (u '+ 1, v') are linearly interpolated in one dimension to obtain g (u0V ') (1- α) g (u ', v ') + α g (u ' +1, v '); then, one-dimensional linear interpolation is carried out on two points of (u ', v' +1) and (u '+ 1, v' +1) to obtain g (u0V ' +1) ═ g (1- α) g (u ', v ' +1) + α g (u ' +1, v ' + 1); finally, the obtained (u) is aligned again0,v′),(u0V' +1) two points are subjected to one-dimensional linear interpolation to obtain g (u)0,v0)=(1-β)g(u0,v′+1)+βg(u0V' +1), so there are:
g(u0,v0)=(1-α)(1-β)g(u′,v′)+α(1-β)g(u′+1,v′)+β(1-α)g(u′,v′+1)+α·βg(u′+1,v′+1)
In step 3, the feature map obtained in the step 6 is input into the RPN network to perform candidate target positioning, whether the obtained candidate target belongs to the foreground or the background is judged through a Softmax two-classifier, meanwhile, the position of the candidate target is corrected through a bounding box regressor, so that a target candidate region is obtained, finally, the final feature map and the candidate region generated in the RPN network are sent into the Fast-RCNN network, and finally, the classification and regression of the target are realized.
In step 4, a loss function is established based on the target classification and regression, and the classification loss is defined as:
for regression of the frame, smooth-L is adopted1The loss, defined as:
the loss function is therefore:
wherein N isclsFor training the number of anchor frames, p, in the RPN processiAs the probability that the anchor frame is predicted as the target, when the prediction result is a positive sample,otherwise predicting as negative sampleNregAs the number of anchor points, tiFor the offset predicted by the RPN training phase,is the offset relative to the real frame; λ is a balance parameter between the two terms.
In step 5, when the picture is repeatedly selected for training, the input selection of the next iteration is adaptively determined by using the loss in the current iteration as feedback, as shown in fig. 3. In the current iteration t, if the loss of the small object is negligible, namely the small target loss ratioAnd if the input value is less than a certain threshold value, the input of the iteration t +1 is the spliced image, otherwise, the input is still the conventional image under the default setting.
The loss fraction of small objects is calculated as follows:
wherein A isoIs the area of a small object, wo,hoIn order to be able to determine its width and height,is shown from area AoNot more than AsOf small objects, here As=1024,LtRepresenting the loss, scale, of all objects from the current imageAs a potential feedback to guide the learning of the next iteration.
The working process and principle of the invention are as follows: the invention discloses a small target object detection method based on deep learning, which can overcome the problems of insufficient detection efficiency, low accuracy and the like in the existing small target object detection method. Firstly, extracting an image without a small target object based on a COCO data set, splicing after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set; then, modifying the basic feature extraction network of the Faster-RCNN to perform feature fusion; then, selecting a candidate region of each level of fused features after fusion through an RPN; then, training the improved network by using a training set to obtain a training model; and finally, inputting the test set into the trained model for target detection.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (10)
1. A small target object detection method based on deep learning is characterized by comprising the following steps:
step S1: extracting an image without a small target object based on the COCO data set, splicing the image after adjusting the size of the image, forming a new data set by the spliced image and the image with the small target object in the COCO data set, and carrying out image splicing according to the ratio of 4: 1, dividing a data set into a training set and a testing set;
step S2: modifying a basic feature extraction network of the Faster-RCNN to perform feature fusion;
step S3: selecting a candidate region of each level of fused features after fusion through an RPN;
step S4: inputting the training image into an improved network for training, and constructing a loss function according to target classification and regression;
step S5: repeatedly selecting a training picture until the loss function is converged and storing the training model;
step S6: and inputting the test set into the trained model for testing.
2. The method for detecting small target objects based on deep learning of claim 1, wherein in step S1, the images in the COCO data set that do not include small target objects are resized and the resized images are stitched together; the method aims to solve the problem of unbalance of the sizes of target objects in the training process, the size of the spliced image is the same as that of a conventional image, and large objects and medium objects are reduced into medium objects and small objects in this way, so that the distribution of the objects with different scales in the training process is balanced.
3. The method for detecting the small target object based on the deep learning of claim 2 is characterized in that in the image stitching process, k conventional images (with the size of W x H) with uniform resolution are scaled through a nearest neighbor interpolation method and then combined to form a stitched image; to preserve the properties of the original image, the scaled image is keptThe aspect ratio of (a) is generally, when k is 1, the stitched image introduces a regular image.
4. The deep learning-based small target object detection method according to claim 3, wherein the nearest neighbor interpolation process is: firstly, assuming that the size of an original image pixel is W x H, the size of the zoomed image pixel is W x H, and the coordinates of each pixel point in the original image are integers; if one pixel point is (X, Y) after scaling, the corresponding pixel point in the original image is (X, Y) ═ W/W X, H/H Y, but the value in (W/W X, H/H Y) is not necessarily an integer due to scaling, and at this time, the value is rounded to an integer and expressed as g, so that g (X, Y) ═ g (W/W X, H/H Y).
5. The deep learning-based small target object detection method according to claim 1, in step S2, the base feature extraction network module of the fast-RCNN is a residual network ResNet-50, which includes input, conv1, maxporoling, conv2_ x, conv3_ x, conv4_ x, conv5_ x, wherein, the conv1 layer comprises 1 convolution operation, the convolution kernel is 7 × 7, the step size is 2, the conv2_ x, conv3_ x, conv4_ x and conv5_ x layers respectively comprise 3, 4, 6 and 3 residual blocks, each residual block comprises 3 layers of convolution, the convolution kernel size is 1 × 1, 3 × 3 and 1 × 1 in sequence, where the step size of the 3 x 3 convolutional layer of the first residual block of the conv3_ x, conv4_ x, conv5_ x layers is 2 in order to downsample to reduce resolution while increasing depth, and the step size of all the remaining convolutional layers is 1.
6. The method for detecting small target objects based on deep learning of claim 1, wherein in step S2, the improved fast-RCNN feature extraction network is based on its basic feature extraction network, and adopts a top-down feature fusion structure with horizontal connection; in the basic feature extraction network module, although top-level features have highly abstract semantic information, the top-level features are insensitive to features such as edges or geometric information of objects due to the fact that multiple pooling downsampling operations are performed, and small target objects are more difficult to characterize; it is worth noting that the shallow feature map usually has high resolution and rich geometric detail features, while the top feature map has stronger semantic abstract information and robustness to the posture, position change and the like of an object, but the resolution is lower, so that on the basis of a bottom-up network structure, a top feature fusion structure with horizontal connection from top to bottom is adopted, the resolution is amplified by an up-sampling means for the top feature map, and the deep feature map and the shallow feature map are combined to generate a feature map with high resolution and rich semantics; after fusion feature maps of different levels are obtained through feature fusion, convolution operation with convolution kernel of 3 × 3 is performed on the fusion feature map of each level, so as to remove aliasing effect caused by up-sampling.
7. The method for detecting the small target object based on the deep learning as claimed in claim 6, wherein the deep features are subjected to bilinear interpolation to enlarge the spatial size of the deep features by upsampling; the bilinear interpolation method is to make twoLinear interpolation of the second dimension, namely, one-dimensional linear interpolation is respectively carried out in the directions of u and v; suppose that the pixel point of the original image corresponding to the pixel point of the new image is (u)0,v0),u0And v0If not an integer, it must fall into four pixels of the original image, which are (u ', v'), (u ', v' +1), (u '+ 1, v'), (u '+ 1, v' +1), respectively; firstly, two points (u ', v'), (u '+ 1, v') are linearly interpolated in one dimension to obtain g (u0V'); then, one-dimensional linear interpolation is carried out on two points of (u ', v' +1) and (u '+ 1, v' +1) to obtain g (u0V' + 1); finally, the obtained (u) is aligned again0,v′),(u0V' +1) two points are subjected to one-dimensional linear interpolation to obtain g (u)0,v0)。
8. The method for detecting small target objects based on deep learning of claim 1, wherein in step S3, the obtained feature map is inputted into the RPN network for locating candidate targets, a Softmax two classifier is used to determine whether the target belongs to the foreground or the background for the obtained candidate targets, meanwhile, a bounding box regressor is used to correct the positions of the candidate targets, so as to obtain target candidate regions, and finally, the final feature map and the candidate regions generated in the RPN network are sent to the Fast-RCNN network, so as to finally realize the classification and regression of the target.
9. The deep learning-based small target object detection method according to claim 1, wherein in step S4, a loss function is established according to target classification and regression, and the classification loss is defined as:
for regression of the frame, smooth-L is adopted1The loss, defined as:
the loss function is therefore:
wherein N isclsFor training the number of anchor frames, p, in the RPN processiAs the probability that the anchor frame is predicted as the target, when the prediction result is a positive sample,otherwise predicting as negative sampleNregAs the number of anchor points, tiFor the offset predicted by the RPN training phase,is the offset relative to the real frame; λ is a balance parameter between the two terms.
10. The method for detecting small target objects based on deep learning of claim 1, wherein in step S5, when the picture is repeatedly selected for training, the input selection of the next iteration is adaptively determined by using the loss in the current iteration as feedback; in the current iteration t, if the loss of the small object is negligible, namely the small target loss ratioAnd if the input value is less than a certain threshold value, the input of the iteration t +1 is the spliced image, otherwise, the input is still the conventional image under the default setting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010723829.XA CN111898668A (en) | 2020-07-24 | 2020-07-24 | Small target object detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010723829.XA CN111898668A (en) | 2020-07-24 | 2020-07-24 | Small target object detection method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111898668A true CN111898668A (en) | 2020-11-06 |
Family
ID=73189910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010723829.XA Pending CN111898668A (en) | 2020-07-24 | 2020-07-24 | Small target object detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111898668A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112712500A (en) * | 2020-12-28 | 2021-04-27 | 同济大学 | Remote sensing image target extraction method based on deep neural network |
CN112819008A (en) * | 2021-01-11 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Method, device, medium and electronic equipment for optimizing instance detection network |
CN112926637A (en) * | 2021-02-08 | 2021-06-08 | 天津职业技术师范大学(中国职业培训指导教师进修中心) | Method for generating text detection training set |
CN112949520A (en) * | 2021-03-10 | 2021-06-11 | 华东师范大学 | Aerial photography vehicle detection method and detection system based on multi-scale small samples |
CN112950481A (en) * | 2021-04-22 | 2021-06-11 | 上海大学 | Water bloom shielding image data collection method based on image mosaic network |
CN113128564A (en) * | 2021-03-23 | 2021-07-16 | 武汉泰沃滋信息技术有限公司 | Typical target detection method and system based on deep learning under complex background |
CN113487551A (en) * | 2021-06-30 | 2021-10-08 | 佛山市南海区广工大数控装备协同创新研究院 | Gasket detection method and device for improving performance of dense target based on deep learning |
CN113628250A (en) * | 2021-08-27 | 2021-11-09 | 北京澎思科技有限公司 | Target tracking method and device, electronic equipment and readable storage medium |
CN113869361A (en) * | 2021-08-20 | 2021-12-31 | 深延科技(北京)有限公司 | Model training method, target detection method and related device |
CN113902024A (en) * | 2021-10-20 | 2022-01-07 | 浙江大立科技股份有限公司 | Small-volume target detection and identification method based on deep learning and dual-band fusion |
CN116912604A (en) * | 2023-09-12 | 2023-10-20 | 浙江大华技术股份有限公司 | Model training method, image recognition device and computer storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341517A (en) * | 2017-07-07 | 2017-11-10 | 哈尔滨工业大学 | The multiple dimensioned wisp detection method of Fusion Features between a kind of level based on deep learning |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN110930310A (en) * | 2019-12-09 | 2020-03-27 | 中国科学技术大学 | Panoramic image splicing method |
CN111144398A (en) * | 2018-11-02 | 2020-05-12 | 银河水滴科技(北京)有限公司 | Target detection method, target detection device, computer equipment and storage medium |
CN112148812A (en) * | 2019-06-26 | 2020-12-29 | 丰图科技(深圳)有限公司 | Method, device and equipment for extracting road center line and storage medium thereof |
-
2020
- 2020-07-24 CN CN202010723829.XA patent/CN111898668A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341517A (en) * | 2017-07-07 | 2017-11-10 | 哈尔滨工业大学 | The multiple dimensioned wisp detection method of Fusion Features between a kind of level based on deep learning |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN111144398A (en) * | 2018-11-02 | 2020-05-12 | 银河水滴科技(北京)有限公司 | Target detection method, target detection device, computer equipment and storage medium |
CN112148812A (en) * | 2019-06-26 | 2020-12-29 | 丰图科技(深圳)有限公司 | Method, device and equipment for extracting road center line and storage medium thereof |
CN110930310A (en) * | 2019-12-09 | 2020-03-27 | 中国科学技术大学 | Panoramic image splicing method |
Non-Patent Citations (1)
Title |
---|
YUKANG CHEN等: ""Stitcher: Feedback-driven Data Provider for Object Detection"", 《ARXIV》, pages 1 - 7 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112712500A (en) * | 2020-12-28 | 2021-04-27 | 同济大学 | Remote sensing image target extraction method based on deep neural network |
CN112819008A (en) * | 2021-01-11 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Method, device, medium and electronic equipment for optimizing instance detection network |
CN112926637A (en) * | 2021-02-08 | 2021-06-08 | 天津职业技术师范大学(中国职业培训指导教师进修中心) | Method for generating text detection training set |
CN112949520B (en) * | 2021-03-10 | 2022-07-26 | 华东师范大学 | Aerial photography vehicle detection method and detection system based on multi-scale small samples |
CN112949520A (en) * | 2021-03-10 | 2021-06-11 | 华东师范大学 | Aerial photography vehicle detection method and detection system based on multi-scale small samples |
CN113128564A (en) * | 2021-03-23 | 2021-07-16 | 武汉泰沃滋信息技术有限公司 | Typical target detection method and system based on deep learning under complex background |
CN112950481A (en) * | 2021-04-22 | 2021-06-11 | 上海大学 | Water bloom shielding image data collection method based on image mosaic network |
CN113487551A (en) * | 2021-06-30 | 2021-10-08 | 佛山市南海区广工大数控装备协同创新研究院 | Gasket detection method and device for improving performance of dense target based on deep learning |
CN113487551B (en) * | 2021-06-30 | 2024-01-16 | 佛山市南海区广工大数控装备协同创新研究院 | Gasket detection method and device for improving dense target performance based on deep learning |
CN113869361A (en) * | 2021-08-20 | 2021-12-31 | 深延科技(北京)有限公司 | Model training method, target detection method and related device |
CN113628250A (en) * | 2021-08-27 | 2021-11-09 | 北京澎思科技有限公司 | Target tracking method and device, electronic equipment and readable storage medium |
CN113902024A (en) * | 2021-10-20 | 2022-01-07 | 浙江大立科技股份有限公司 | Small-volume target detection and identification method based on deep learning and dual-band fusion |
CN113902024B (en) * | 2021-10-20 | 2024-06-04 | 浙江大立科技股份有限公司 | Small-volume target detection and identification method based on deep learning and dual-band fusion |
CN116912604A (en) * | 2023-09-12 | 2023-10-20 | 浙江大华技术股份有限公司 | Model training method, image recognition device and computer storage medium |
CN116912604B (en) * | 2023-09-12 | 2024-01-16 | 浙江大华技术股份有限公司 | Model training method, image recognition device and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111898668A (en) | Small target object detection method based on deep learning | |
CN111461110B (en) | Small target detection method based on multi-scale image and weighted fusion loss | |
CN112396002B (en) | SE-YOLOv 3-based lightweight remote sensing target detection method | |
CN111126472A (en) | Improved target detection method based on SSD | |
CN111179217A (en) | Attention mechanism-based remote sensing image multi-scale target detection method | |
CN110287826B (en) | Video target detection method based on attention mechanism | |
CN114202672A (en) | Small target detection method based on attention mechanism | |
CN109816012A (en) | A kind of multiscale target detection method of integrating context information | |
CN110782420A (en) | Small target feature representation enhancement method based on deep learning | |
CN113591795B (en) | Lightweight face detection method and system based on mixed attention characteristic pyramid structure | |
CN109583483A (en) | A kind of object detection method and system based on convolutional neural networks | |
CN112800964A (en) | Remote sensing image target detection method and system based on multi-module fusion | |
CN111274981B (en) | Target detection network construction method and device and target detection method | |
CN115131797B (en) | Scene text detection method based on feature enhancement pyramid network | |
CN112464912B (en) | Robot end face detection method based on YOLO-RGGNet | |
CN116645592B (en) | Crack detection method based on image processing and storage medium | |
CN112800955A (en) | Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid | |
CN111767962A (en) | One-stage target detection method, system and device based on generation countermeasure network | |
CN117152484B (en) | Small target cloth flaw detection method based on improved YOLOv5s | |
CN114529773A (en) | Form identification method, system, terminal and medium based on structural unit | |
EP4352692A1 (en) | Volumetric sampling with correlative characterization for dense estimation | |
CN113496480A (en) | Method for detecting weld image defects | |
CN114463503A (en) | Fusion method and device of three-dimensional model and geographic information system | |
CN116740375A (en) | Image feature extraction method, system and medium | |
CN116403127A (en) | Unmanned aerial vehicle aerial image target detection method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |