CN115272648B - Multi-level receptive field expanding method and system for small target detection - Google Patents
Multi-level receptive field expanding method and system for small target detection Download PDFInfo
- Publication number
- CN115272648B CN115272648B CN202211209625.XA CN202211209625A CN115272648B CN 115272648 B CN115272648 B CN 115272648B CN 202211209625 A CN202211209625 A CN 202211209625A CN 115272648 B CN115272648 B CN 115272648B
- Authority
- CN
- China
- Prior art keywords
- layer
- feature
- receptive field
- features
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000003321 amplification Effects 0.000 claims abstract description 38
- 238000003199 nucleic acid amplification method Methods 0.000 claims abstract description 38
- 230000004927 fusion Effects 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 12
- 238000007781 pre-processing Methods 0.000 claims description 11
- 239000000126 substance Substances 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 7
- 230000001502 supplementing effect Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 4
- 230000007480 spreading Effects 0.000 claims description 4
- 238000003892 spreading Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000013434 data augmentation Methods 0.000 claims description 2
- 230000001105 regulatory effect Effects 0.000 claims 1
- 238000005728 strengthening Methods 0.000 claims 1
- 238000012360 testing method Methods 0.000 abstract description 4
- 230000004807 localization Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 24
- 230000002708 enhancing effect Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 102100031315 AP-2 complex subunit mu Human genes 0.000 description 1
- 101000796047 Homo sapiens AP-2 complex subunit mu Proteins 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/255—Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/766—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Abstract
The invention provides a multi-level receptive field expanding method and a multi-level receptive field expanding system for small target detection, wherein Swin transform is introduced as a backbone network of a model, and the characteristics of a small target are extracted by utilizing the hierarchy, the locality and the translational invariance of the Swin transform; according to different output characteristics of each stage of the backbone network, a multi-stage receptive field expansion network is designed, and the output characteristics of the backbone network are further processed, so that the problem of small target information loss is avoided; in addition, the receptive field amplifying module can effectively expand the receptive field. According to task requirements, the structure of each layer of receptive field amplification module is flexibly adjusted to match the receptive fields required by targets with different scales and obtain rich context information; on the other hand, the proposed joint loss of GIOU loss and BIOU loss is used to enhance the localization performance of the target; the comparison test proves that the invention has good performance in the aspect of small target detection.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a multi-level receptive field expanding method and a multi-level receptive field expanding system for small target detection.
Background
Object detection is an important research direction in the field of computer vision, and is also the basis of other high-level vision tasks. Although the target detection algorithm using the deep learning method has been developed rapidly, the detection of small targets is a difficult point in the target detection. For the field of automatic driving, small targets influencing traffic can be accurately and quickly detected, and the safety of travel of a driver can be ensured; for the industrial automation field, the small defects on the material can be accurately positioned and identified, so that the industrial production efficiency can be ensured; for the field of satellite remote sensing, small target detection can help to solve the problems of illegal fishing boats, illegal cargo transfer and the like. Therefore, a multi-level receptive field amplification network for small target detection is developed, and the method has wide application value and academic research value.
In the field of object detection, the most powerful COCO data set uses absolute size definitions, which specify objects of 32 × 32 pixels or less as small objects, and this standard is widely used. On COCO datasets, small target detection accuracy is generally less than normal target detection accuracy, and therefore small target detection is more challenging than normal targets. Specifically, the task of small target detection mainly faces four challenges: firstly, the characteristics of small objects are difficult to extract, and distinguishing characteristic information is difficult to extract from the small objects with low resolution due to the lack of visual information; secondly, due to downsampling, the features of small objects may converge into a point and even disappear on a deep feature layer; thirdly, the receptive fields are not matched, the large receptive field is suitable for large object detection, and the small receptive field is beneficial for small object detection; finally, small objects need higher positioning accuracy, the small target detection is greatly influenced by the deviation of the boundary frame, the small objects are difficult to accurately position, and the condition of missing detection may occur.
Therefore, it is necessary to provide a multi-level receptive field expanding method and system for small target detection to solve the above technical problems.
Disclosure of Invention
Therefore, embodiments of the present invention provide a multi-level receptive field expansion method and system for small target detection to solve the above technical problems.
The invention provides a multi-level receptive field expanding method for small target detection, wherein the method comprises the following steps:
firstly, preprocessing applicable to small target detection is carried out on an input image in a COCO data set;
step two, introducing a Swin Transformer as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin Transformer to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer;
thirdly, constructing a multi-level reception field feature fusion network, matching the required reception fields of all feature layers in the Swin transform through a reception field feature amplification module in the multi-level reception field feature fusion network and supplementing shallow prediction features, wherein after matching, each feature layer is correspondingly provided with a plurality of reception field feature amplification modules;
step four, taking the linear combination of the GIOU loss and the BIOU loss as the regression loss of the bounding box, and enhancing the target positioning effect according to the regression loss function of the corresponding bounding box;
and step five, distributing the targets with different scales in the input image on the feature layers with different receptive fields, and positioning and identifying the small targets by utilizing the shallow prediction feature layer in the detection model to obtain the positioning and identifying results of the small targets.
The invention provides a multi-level receptive field expansion method for small target detection, which introduces Swin transform as a backbone network of a model and utilizes the hierarchy, locality and translational invariance to extract the characteristics of a small target; according to the difference of output characteristics of each stage of the backbone network, a multi-stage receptive field expansion network is designed, and the output characteristics of the backbone network are further processed, so that the problem of small target information loss is avoided; in addition, the receptive field amplifying module can effectively expand the receptive field. According to task requirements, the structure of each layer of receptive field amplification module is flexibly adjusted to match receptive fields required by targets with different scales, and rich context information is obtained; on the other hand, the proposed combined loss of GIOU loss and BIOU loss is used to enhance the localization performance of the target; the comparison test proves that the invention has good performance in the aspect of small target detection.
The multi-level receptive field expanding method for small target detection is characterized in that in the step one, the preprocessing comprises the following steps:
designing a data enhancement strategy, wherein the data enhancement strategy is as follows: scaling an image size of an input image, using multi-scale training to enhance sample scale diversity;
and randomly and horizontally turning the input image in the COCO data set for data augmentation so as to enhance the generalization capability of the model.
The multi-level receptive field expanding method for small target detection is characterized in that the Swin transducer corresponds to a four-layer structure, and four extraction features with different scales and different depths are correspondingly extractedIn whichMeridian/channelConvolution adjusts the channel number to obtain the characteristicsWherein, in the process,;
output characteristics of four different scales are output by multi-level receptive field characteristic fusion networkIn which;
The receptive field feature amplification modules on four feature layers in the multi-level receptive field feature fusion network are represented asWherein, in the process,;
the corresponding relationship is as follows:
wherein the content of the first and second substances,respectively representing the layer 2 output characteristic, the layer 3 output characteristic, the layer 4 output characteristic and the layer 5 output characteristic,respectively representing layer 2 features, layer 3 features, layer 4 features and layer 5 features,respectively show the receptive field characteristic amplifying modules on the 2 nd characteristic layer, the 3 rd characteristic layer, the 4 th characteristic layer and the 5 th characteristic layer,representing the number of the receptive field feature amplification modules in a single feature layer,meaning upsampling using a two-fold adjacent sample interpolation.
The multi-level receptive field expanding method for small target detection is characterized in that the receptive field characteristic amplifying module comprises a plurality of basesA basic unit, which is the layer 4 characteristic of Swin Transformer of the backbone network in the layer 4 characteristicBase unit 1 of the receptive field feature amplification moduleObtaining a first base unit output characteristicThen passes through the 2 nd basic unitObtaining a second base unit output characteristicFinally via the 3 rd basic unitMerging layer 4 features of the backbone network by residual connectionThird base unit output characteristics for layer 4 characteristics;
The corresponding expression is:
wherein the third basic unit outputs characteristicsThe output characteristic of the first receptive field characteristic amplification module of the 4 th characteristic layer is obtained.
The multi-level receptive field expanding method for small target detection is characterized in that the first basic unit outputs characteristicsIs expressed as:
wherein the content of the first and second substances,representThe result of the convolution is,representing a convolution kernel ofThe convolution of the holes of (a) with (b),which represents the spreading ratio of the convolution of the hole,which represents the normalization of the batch,it is shown that the activation function is,to representInvolving batch normalization and activation functionsThe result of the convolution is,the representation comprising batch normalization and activation functionsAnd (4) convolution of holes.
The multi-level receptive field expanding method for small target detection is characterized in that the bounding box regression loss function is expressed as:
wherein the content of the first and second substances,a bounding box regression loss function is represented,represents the GIOU loss function,represents the BIOU loss function,a prediction bounding box is represented that is,a reference frame is shown in the drawing,the position of the bounding box is indicated,,the coordinates representing the center point of the bounding box,respectively representing the width and height of the bounding box,represents the minimum bounding box area of the prediction bounding box and the annotation box,indicating a loss of Smooth L1,indicating a contact ratio calculation.
In the fifth step, in the identification task, a Focal loss function is used for solving the problem of imbalance of positive and negative samples, and the corresponding Focal loss function is expressed as:
wherein the content of the first and second substances,the local function is represented by the following formula,a prediction score is represented by a number of points,the presence of a real label is indicated,representing the number of balanced positive and negative samples,representThe factor is adjusted.
In the step five, the total loss function corresponding to the positioning and recognition of the detection model is expressed as:
wherein, the first and the second end of the pipe are connected with each other,representing the total loss function of the detection model for positioning and identifying,both represent hyper-parameters.
The invention also provides a multi-level receptive field expanding system for small target detection, wherein the system comprises:
a pre-processing module to:
preprocessing suitable for small target detection is carried out on an input image in the COCO data set;
a feature extraction module to:
introducing a Swin transform as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin transform to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer;
a network construction module to:
constructing a multi-level receptive field feature fusion network, matching the required receptive fields of all feature layers in the Swin transducer and supplementing shallow prediction features through receptive field feature amplification modules in the multi-level receptive field feature fusion network, wherein after matching, each feature layer corresponds to a plurality of receptive field feature amplification modules;
a loss determination module to:
taking the linear combination of the GIOU loss and the BIOU loss as the regression loss of the bounding box, and enhancing the target positioning effect according to the regression loss function of the corresponding bounding box;
a result output module to:
and distributing the targets with different scales in the input image on the feature layers with different receptive fields, and positioning and identifying the small targets by utilizing the shallow prediction feature layer in the detection model to obtain the positioning and identifying results of the small targets.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of embodiments of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a multi-level receptive field expansion method for small target detection proposed by the present invention;
fig. 2 is a schematic structural diagram of a multi-level receptive field expanding system for small target detection according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a multi-level receptive field expanding method for small target detection, wherein the method includes the following steps:
s101, preprocessing applicable to small object detection is carried out on the input image in the COCO data set.
In particular, most images in the COCO dataset are of life origin, with a complex background. There were 80 classes in the dataset, each image containing on average 3.5 classes and 7.7 instances. In the COCO dataset, targets with an area less than 32 × 32 were defined as small targets, the percentage of small objects being 41%.
In step S101, the preprocessing includes the steps of:
s1011, designing a data enhancement strategy, wherein the data enhancement strategy is as follows: the image size of the input image is scaled, using multi-scale training to enhance sample scale diversity.
In this step, the scaled image size is (480, 1333).
And S1012, performing data amplification on the input image in the COCO data set by adopting random horizontal inversion to enhance the generalization capability of the model.
S102, introducing a Swin Transformer as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin Transformer to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer.
In step S102, the Swin Transformer has a four-layer structure, and the multi-layer receptive field feature fusion network is used for outputting four output features with different scalesWhereinThe receptive field feature amplification modules on four feature layers in the multilevel receptive field feature fusion network are represented asWherein。
Firstly, inputting a preprocessed image, and correspondingly extracting four extraction features with different scales and different depthsWherein, in the process,meridian/channelConvolution adjusts the channel number to obtain the characteristicsWherein;
Then, layer 5 featuresThroughnA 5 th layer reception field characteristic amplification module obtains 5 th layer output characteristicsIs expressed by the formula;
Next, layer 5 output featuresThen, the feature is obtained by up-sampling by a two-fold adjacent sampling interpolation methodThrough the 4 th layernThe characteristics obtained by the 4 th layer reception field characteristic amplification module are fused to obtain the 4 th layer output characteristicsIs expressed by the formula. In the same way, willUpsampling and layer 3 featuresFusing to obtain layer 3 output characteristicsThen, the same operation is used to obtain the output characteristics of the 2 nd layer。
Specifically, the correspondence is as follows:
wherein the content of the first and second substances,respectively representing the layer 2 output characteristic, the layer 3 output characteristic, the layer 4 output characteristic and the layer 5 output characteristic,respectively representing layer 2 features, layer 3 features, layer 4 features and layer 5 features,respectively represent the receptive fieldness on the 2 nd, 3 rd, 4 th and 5 th feature layersA sign-up and amplification module for amplifying the light,representing the number of the receptive field feature amplification modules in a single feature layer,indicating upsampling using a double neighbor interpolation method.
S103, constructing a multi-level reception field feature fusion network, matching the required reception fields of all feature layers in the Swin transform through a reception field feature amplification module in the multi-level reception field feature fusion network and supplementing shallow prediction features, wherein after matching, each feature layer is provided with a plurality of reception field feature amplification modules correspondingly.
In the invention, the receptive field characteristic amplification module comprises a plurality of basic units, and taking the 4 th characteristic layer as an example, in the 4 th characteristic layer, the 4 th layer characteristic of Swin transducer as a backbone network1 st basic unit of receptor field characteristic amplification moduleObtaining a first base unit output characteristicThen passes through the 2 nd basic unitObtaining a second base unit output characteristicFinally, via the 3 rd basic unitMerging layer 4 features of the backbone network by residual connectionThird base unit output characteristics for layer 4 characteristics;
The corresponding expression is:
wherein the third basic unit outputs characteristicsThe output characteristic of the first receptive field characteristic amplifying module of the 4 th characteristic layer.
As a supplementary note, the other receptive field feature amplification modules on the 4 th feature layer all perform the same operation as described above with the output of the previous receptive field feature amplification module as an input feature. The receptive field feature amplification module on the 4 th feature layer has 3 basic units, the receptive field amplification modules on the 5 th, 3 rd and 2 nd features have 4, 3 and 1 basic units respectively, and the operation is the same as that of the 4 th feature layer.
wherein, the first and the second end of the pipe are connected with each other,representThe result of the convolution is,representing a convolution kernel ofThe convolution of the hole of (a) with (b),which represents the spreading ratio of the convolution of the hole,which represents the normalization of the batch,it is shown that the activation function is,the representation comprising batch normalization and activation functionsThe result of the convolution is,the representation comprising batch normalization and activation functionsAnd (4) performing hole convolution.
Additionally, the spreading ratio of each base unit is carefully designed. In order to avoid the problem of detailed information loss caused by the checkerboard effect caused by the discontinuity of the used data, different expansion rates are set for the basic units on different feature layers so as to fully utilize the information and match the required receptive fields of targets with different scales. For the 5 th feature layer, the expansion rates of the basic units are respectively set to be 1, 3, 9 and 9; settings 1, 3, 9 for the 4 th feature layer; the 3 rd feature layer is set to be 1, 2 and 3; the 2 nd feature layer is set to 1.
And S104, taking the linear combination of the GIOU loss and the BIOU loss as the regression loss of the boundary frame, and enhancing the target positioning effect according to the corresponding regression loss function of the boundary frame.
In step S104, the bounding box regression loss function is expressed as:
wherein the content of the first and second substances,a bounding box regression loss function is represented,represents the GIOU loss function,represents the BIOU loss function,a prediction bounding box is represented that is,a reference frame is shown in the drawing,the position of the bounding box is indicated,,the coordinates representing the center point of the bounding box,respectively representing the width and height of the bounding box,represents the minimum bounding box area of the prediction bounding box and the annotation box,indicating a loss of Smooth L1,representing the contact ratio calculation.
And S105, distributing the targets with different scales in the input image on the feature layers with different receptive fields, and positioning and identifying the small targets by utilizing the shallow prediction feature layer in the detection model to obtain the positioning identification result of the small targets.
In step S105, in the identification task, the positive and negative sample imbalance problem is solved using the following Focal loss function:
wherein the content of the first and second substances,the local function is represented by the following formula,the prediction score is represented by a number of prediction points,a real label is represented by a tag that is true,representing the number of balanced positive and negative samples,indicating the adjustment factor.
In addition, the total loss function corresponding to the positioning and identification performed by the detection model is represented as:
wherein, the first and the second end of the pipe are connected with each other,representing the total loss function of the detection model for positioning and identifying,both represent hyper-parameters.
Specifically, after model training is completed, a test set sample is input, and output average precision AP (IOU threshold value 0.50-0.95), AP50 (IOU threshold value 0.50), AP75 (IOU threshold value 0.75) and APS (IOU threshold value 0.50-0.95 and an object smaller than 32 × 32 pixels) are obtained and used for evaluating model performance.
The invention provides a multi-level receptive field expansion method for small target detection, which introduces Swin transform as a backbone network of a model and utilizes the hierarchy, locality and translational invariance to extract the characteristics of a small target; according to different output characteristics of each stage of the backbone network, a multi-stage receptive field expansion network is designed, and the output characteristics of the backbone network are further processed, so that the problem of small target information loss is avoided; in addition, the receptive field amplifying module can effectively expand the receptive field. According to task requirements, the structure of each layer of receptive field amplification module is flexibly adjusted to match the receptive fields required by targets with different scales and obtain rich context information; on the other hand, the proposed combined loss of GIOU loss and BIOU loss is used to enhance the localization performance of the target; the comparison test proves that the invention has good performance in the aspect of small target detection.
Referring to fig. 2, the present invention further provides a multi-level receptive field expanding system for small target detection, wherein the system comprises:
a pre-processing module to:
preprocessing suitable for small target detection is carried out on an input image in the COCO data set;
a feature extraction module to:
introducing a Swin transform as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin transform to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer;
a network construction module to:
constructing a multi-level receptive field feature fusion network, matching the required receptive fields of all feature layers in the Swin transducer and supplementing shallow prediction features through receptive field feature amplification modules in the multi-level receptive field feature fusion network, wherein after matching, each feature layer corresponds to a plurality of receptive field feature amplification modules;
a loss determination module to:
taking the linear combination of the GIOU loss and the BIOU loss as the regression loss of the bounding box, and enhancing the target positioning effect according to the regression loss function of the corresponding bounding box;
a result output module to:
and distributing the targets with different scales in the input image on the feature layers with different receptive fields, and positioning and identifying the small targets by utilizing the shallow prediction feature layer in the detection model to obtain the positioning and identifying results of the small targets.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Claims (2)
1. A multi-level receptive field expansion method for small target detection, the method comprising the steps of:
firstly, preprocessing applicable to small target detection is carried out on an input image in a COCO data set;
step two, introducing a Swin Transformer as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin Transformer to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer;
thirdly, constructing a multi-level reception field feature fusion network, matching the required reception fields of all feature layers in the Swin transform through a reception field feature amplification module in the multi-level reception field feature fusion network and supplementing shallow prediction features, wherein after matching, each feature layer is correspondingly provided with a plurality of reception field feature amplification modules;
step four, taking the linear combination of the GIOU loss and the BIOU loss as the regression loss of the bounding box, and strengthening the target positioning effect according to the regression loss function of the corresponding bounding box;
step five, distributing the targets with different scales in the input image on feature layers with different receptive fields, and positioning and identifying the small targets by utilizing a shallow prediction feature layer in the detection model to obtain the positioning identification result of the small targets;
in the first step, the pretreatment comprises the following steps:
designing a data enhancement strategy, wherein the data enhancement strategy is as follows: scaling an image size of an input image, using multi-scale training to enhance sample scale diversity;
randomly turning horizontally an input image in the COCO data set to perform data augmentation so as to enhance the generalization capability of the model;
the Swin transducer is correspondingly provided with a four-layer structure, and four extraction features with different scales and different depths are correspondingly extractedWherein, in the step (A),meridian/channelConvolution adjusts the channel number to obtain the characteristicsWherein, in the step (A),;
multi-level reception field feature fusion network for outputting output features of four different scalesWherein, in the step (A),;
multilevel receptive field featuresThe receptive field characteristic amplification module on the four characteristic layers in the converged network is represented asWherein, in the process,;
the corresponding relationship is as follows:
wherein the content of the first and second substances,respectively representing the layer 2 output characteristic, the layer 3 output characteristic, the layer 4 output characteristic and the layer 5 output characteristic,respectively representing layer 2 features, layer 3 features, layer 4 features and layer 5 features,respectively showing the amplification of the characteristics of the receptive field on the 2 nd, 3 rd, 4 th and 5 th characteristic layersThe module is provided with a plurality of modules,represents the number of the receptive field feature amplification modules in a single feature layer,representing upsampling using a double neighboring sample interpolation;
the receptive field characteristic amplification module comprises a plurality of basic units, and in the 4 th characteristic layer, the 4 th layer characteristic of Swin transducer as a backbone networkBase unit 1 of the receptive field feature amplification moduleObtaining a first base unit output characteristicThen passes through the 2 nd basic unitObtaining a second base unit output characteristicFinally, via the 3 rd basic unitMerging layer 4 features of the backbone network by residual connectionThird base unit output characteristics for layer 4 characteristics;
The corresponding expression is:
wherein the third basic unit outputs characteristicsThe output characteristic of the first receptive field characteristic amplification module of the 4 th characteristic layer is obtained;
wherein the content of the first and second substances,representThe result of the convolution is,representing a convolution kernel ofThe convolution of the holes of (a) with (b),which represents the spreading ratio of the convolution of the hole,which represents the normalization of the batch,it is shown that the activation function is,
the representation comprising batch normalization and activation functionsThe result of the convolution is,representing a representation containing batch normalization and activation functionsPerforming hole convolution;
the bounding box regression loss function is expressed as:
wherein, the first and the second end of the pipe are connected with each other,a bounding box regression loss function is represented,represents the GIOU loss function,represents the BIOU loss function,a prediction bounding box is represented that is,a reference frame is shown in the drawing,the position of the bounding box is indicated,,the coordinates representing the center point of the bounding box,respectively representing the width and height of the bounding box,represents the minimum bounding box area of the prediction bounding box and the annotation box,indicating a loss of Smooth L1,representing a contact ratio calculation;
in the fifth step, in the identification task, a Focal local function is used for solving the problem of imbalance of positive and negative samples, and the corresponding Focal local function is expressed as follows:
wherein, the first and the second end of the pipe are connected with each other,the local function is represented by the following formula,a prediction score is represented by a number of points,the presence of a real label is indicated,representing the number of balanced positive and negative samples,represents a regulatory factor;
in the fifth step, the total loss function corresponding to the positioning and identification performed by the detection model is represented as:
2. A multi-level receptive field expansion system for small target detection, characterized in that the system applies a multi-level receptive field expansion method for small target detection as claimed in claim 1 above, the system comprising:
a pre-processing module to:
preprocessing suitable for small target detection is carried out on an input image in the COCO data set;
a feature extraction module to:
introducing a Swin transform as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin transform to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer;
a network construction module to:
constructing a multi-level receptive field feature fusion network, matching the required receptive fields of all feature layers in the Swin transducer and supplementing shallow prediction features through receptive field feature amplification modules in the multi-level receptive field feature fusion network, wherein after matching, each feature layer corresponds to a plurality of receptive field feature amplification modules;
a loss determination module to:
the linear combination of the GIOU loss and the BIOU loss is used as the regression loss of the boundary frame, and the target positioning effect is enhanced according to the corresponding regression loss function of the boundary frame;
a result output module to:
and distributing the targets with different scales in the input image on the feature layers with different receptive fields, and positioning and identifying the small targets by utilizing the shallow prediction feature layer in the detection model to obtain the positioning and identifying results of the small targets.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211209625.XA CN115272648B (en) | 2022-09-30 | 2022-09-30 | Multi-level receptive field expanding method and system for small target detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211209625.XA CN115272648B (en) | 2022-09-30 | 2022-09-30 | Multi-level receptive field expanding method and system for small target detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115272648A CN115272648A (en) | 2022-11-01 |
CN115272648B true CN115272648B (en) | 2022-12-20 |
Family
ID=83757963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211209625.XA Active CN115272648B (en) | 2022-09-30 | 2022-09-30 | Multi-level receptive field expanding method and system for small target detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115272648B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108288075A (en) * | 2018-02-02 | 2018-07-17 | 沈阳工业大学 | A kind of lightweight small target detecting method improving SSD |
CN110321923A (en) * | 2019-05-10 | 2019-10-11 | 上海大学 | Object detection method, system and the medium of different scale receptive field Feature-level fusion |
CN111767792A (en) * | 2020-05-22 | 2020-10-13 | 上海大学 | Multi-person key point detection network and method based on classroom scene |
CN212062695U (en) * | 2020-07-06 | 2020-12-01 | 华东交通大学 | Multi-band MIMO antenna based on orthogonal layout |
WO2021185379A1 (en) * | 2020-03-20 | 2021-09-23 | 长沙智能驾驶研究院有限公司 | Dense target detection method and system |
CN114998696A (en) * | 2022-05-26 | 2022-09-02 | 燕山大学 | YOLOv3 target detection method based on feature enhancement and multi-level fusion |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019028725A1 (en) * | 2017-08-10 | 2019-02-14 | Intel Corporation | Convolutional neural network framework using reverse connections and objectness priors for object detection |
CN111695430B (en) * | 2020-05-18 | 2023-06-30 | 电子科技大学 | Multi-scale face detection method based on feature fusion and visual receptive field network |
CN111967538B (en) * | 2020-09-25 | 2024-03-15 | 北京康夫子健康技术有限公司 | Feature fusion method, device and equipment applied to small target detection and storage medium |
-
2022
- 2022-09-30 CN CN202211209625.XA patent/CN115272648B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108288075A (en) * | 2018-02-02 | 2018-07-17 | 沈阳工业大学 | A kind of lightweight small target detecting method improving SSD |
CN110321923A (en) * | 2019-05-10 | 2019-10-11 | 上海大学 | Object detection method, system and the medium of different scale receptive field Feature-level fusion |
WO2021185379A1 (en) * | 2020-03-20 | 2021-09-23 | 长沙智能驾驶研究院有限公司 | Dense target detection method and system |
CN111767792A (en) * | 2020-05-22 | 2020-10-13 | 上海大学 | Multi-person key point detection network and method based on classroom scene |
CN212062695U (en) * | 2020-07-06 | 2020-12-01 | 华东交通大学 | Multi-band MIMO antenna based on orthogonal layout |
CN114998696A (en) * | 2022-05-26 | 2022-09-02 | 燕山大学 | YOLOv3 target detection method based on feature enhancement and multi-level fusion |
Non-Patent Citations (3)
Title |
---|
Two Dimensional Frequency-angle Domain Interpolation Method for Electromagnetic Scattering Analysis of Precipitation Particles;Jiaqi Chen等;《IEEE》;20161110;全文 * |
基于改进Faster R-CNN图像小目标检测;王凯等;《电视技术》;20191025(第20期);全文 * |
基于有效感受野的目标检测算法;杨建秀;《山西大同大学学报(自然科学版)》;20200818(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115272648A (en) | 2022-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647585B (en) | Traffic identifier detection method based on multi-scale circulation attention network | |
Xue et al. | A fast detection method via region‐based fully convolutional neural networks for shield tunnel lining defects | |
CN112884064B (en) | Target detection and identification method based on neural network | |
Saberironaghi et al. | Defect detection methods for industrial products using deep learning techniques: a review | |
CN110852316A (en) | Image tampering detection and positioning method adopting convolution network with dense structure | |
CN110751154B (en) | Complex environment multi-shape text detection method based on pixel-level segmentation | |
CN112488025B (en) | Double-temporal remote sensing image semantic change detection method based on multi-modal feature fusion | |
CN110009622B (en) | Display panel appearance defect detection network and defect detection method thereof | |
CN115439442A (en) | Industrial product surface defect detection and positioning method and system based on commonality and difference | |
CN113591719A (en) | Method and device for detecting text with any shape in natural scene and training method | |
CN114462469B (en) | Training method of target detection model, target detection method and related device | |
Choi et al. | Deep learning based defect inspection using the intersection over minimum between search and abnormal regions | |
Yasmin et al. | Small obstacles detection on roads scenes using semantic segmentation for the safe navigation of autonomous vehicles | |
Liang et al. | Car detection and classification using cascade model | |
Ni et al. | Toward high-precision crack detection in concrete bridges using deep learning | |
CN113255555A (en) | Method, system, processing equipment and storage medium for identifying Chinese traffic sign board | |
CN115272648B (en) | Multi-level receptive field expanding method and system for small target detection | |
Kwon et al. | Context and scale-aware YOLO for welding defect detection | |
Das et al. | Object Detection on Scene Images: A Novel Approach | |
Ghahremani et al. | Toward robust multitype and orientation detection of vessels in maritime surveillance | |
CN115294103B (en) | Real-time industrial surface defect detection method based on semantic segmentation | |
Dong et al. | Intelligent pixel-level pavement marking detection using 2D laser pavement images | |
CN112330683B (en) | Lineation parking space segmentation method based on multi-scale convolution feature fusion | |
CN117475262B (en) | Image generation method and device, storage medium and electronic equipment | |
Anand et al. | WA net: Leveraging Atrous and Deformable Convolutions for Efficient Text Detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240116 Address after: 230000 B-1015, wo Yuan Garden, 81 Ganquan Road, Shushan District, Hefei, Anhui. Patentee after: HEFEI MINGLONG ELECTRONIC TECHNOLOGY Co.,Ltd. Address before: No. 808, Shuanggang East Street, Nanchang Economic and Technological Development Zone, Jiangxi Province Patentee before: East China Jiaotong University |