CN115272648B - Multi-level receptive field expanding method and system for small target detection - Google Patents

Multi-level receptive field expanding method and system for small target detection Download PDF

Info

Publication number
CN115272648B
CN115272648B CN202211209625.XA CN202211209625A CN115272648B CN 115272648 B CN115272648 B CN 115272648B CN 202211209625 A CN202211209625 A CN 202211209625A CN 115272648 B CN115272648 B CN 115272648B
Authority
CN
China
Prior art keywords
layer
feature
receptive field
features
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211209625.XA
Other languages
Chinese (zh)
Other versions
CN115272648A (en
Inventor
阙越
甘梦晗
刘志伟
张月园
熊汉卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Minglong Electronic Technology Co ltd
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong University filed Critical East China Jiaotong University
Priority to CN202211209625.XA priority Critical patent/CN115272648B/en
Publication of CN115272648A publication Critical patent/CN115272648A/en
Application granted granted Critical
Publication of CN115272648B publication Critical patent/CN115272648B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention provides a multi-level receptive field expanding method and a multi-level receptive field expanding system for small target detection, wherein Swin transform is introduced as a backbone network of a model, and the characteristics of a small target are extracted by utilizing the hierarchy, the locality and the translational invariance of the Swin transform; according to different output characteristics of each stage of the backbone network, a multi-stage receptive field expansion network is designed, and the output characteristics of the backbone network are further processed, so that the problem of small target information loss is avoided; in addition, the receptive field amplifying module can effectively expand the receptive field. According to task requirements, the structure of each layer of receptive field amplification module is flexibly adjusted to match the receptive fields required by targets with different scales and obtain rich context information; on the other hand, the proposed joint loss of GIOU loss and BIOU loss is used to enhance the localization performance of the target; the comparison test proves that the invention has good performance in the aspect of small target detection.

Description

Multi-level receptive field expanding method and system for small target detection
Technical Field
The invention relates to the technical field of computer vision, in particular to a multi-level receptive field expanding method and a multi-level receptive field expanding system for small target detection.
Background
Object detection is an important research direction in the field of computer vision, and is also the basis of other high-level vision tasks. Although the target detection algorithm using the deep learning method has been developed rapidly, the detection of small targets is a difficult point in the target detection. For the field of automatic driving, small targets influencing traffic can be accurately and quickly detected, and the safety of travel of a driver can be ensured; for the industrial automation field, the small defects on the material can be accurately positioned and identified, so that the industrial production efficiency can be ensured; for the field of satellite remote sensing, small target detection can help to solve the problems of illegal fishing boats, illegal cargo transfer and the like. Therefore, a multi-level receptive field amplification network for small target detection is developed, and the method has wide application value and academic research value.
In the field of object detection, the most powerful COCO data set uses absolute size definitions, which specify objects of 32 × 32 pixels or less as small objects, and this standard is widely used. On COCO datasets, small target detection accuracy is generally less than normal target detection accuracy, and therefore small target detection is more challenging than normal targets. Specifically, the task of small target detection mainly faces four challenges: firstly, the characteristics of small objects are difficult to extract, and distinguishing characteristic information is difficult to extract from the small objects with low resolution due to the lack of visual information; secondly, due to downsampling, the features of small objects may converge into a point and even disappear on a deep feature layer; thirdly, the receptive fields are not matched, the large receptive field is suitable for large object detection, and the small receptive field is beneficial for small object detection; finally, small objects need higher positioning accuracy, the small target detection is greatly influenced by the deviation of the boundary frame, the small objects are difficult to accurately position, and the condition of missing detection may occur.
Therefore, it is necessary to provide a multi-level receptive field expanding method and system for small target detection to solve the above technical problems.
Disclosure of Invention
Therefore, embodiments of the present invention provide a multi-level receptive field expansion method and system for small target detection to solve the above technical problems.
The invention provides a multi-level receptive field expanding method for small target detection, wherein the method comprises the following steps:
firstly, preprocessing applicable to small target detection is carried out on an input image in a COCO data set;
step two, introducing a Swin Transformer as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin Transformer to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer;
thirdly, constructing a multi-level reception field feature fusion network, matching the required reception fields of all feature layers in the Swin transform through a reception field feature amplification module in the multi-level reception field feature fusion network and supplementing shallow prediction features, wherein after matching, each feature layer is correspondingly provided with a plurality of reception field feature amplification modules;
step four, taking the linear combination of the GIOU loss and the BIOU loss as the regression loss of the bounding box, and enhancing the target positioning effect according to the regression loss function of the corresponding bounding box;
and step five, distributing the targets with different scales in the input image on the feature layers with different receptive fields, and positioning and identifying the small targets by utilizing the shallow prediction feature layer in the detection model to obtain the positioning and identifying results of the small targets.
The invention provides a multi-level receptive field expansion method for small target detection, which introduces Swin transform as a backbone network of a model and utilizes the hierarchy, locality and translational invariance to extract the characteristics of a small target; according to the difference of output characteristics of each stage of the backbone network, a multi-stage receptive field expansion network is designed, and the output characteristics of the backbone network are further processed, so that the problem of small target information loss is avoided; in addition, the receptive field amplifying module can effectively expand the receptive field. According to task requirements, the structure of each layer of receptive field amplification module is flexibly adjusted to match receptive fields required by targets with different scales, and rich context information is obtained; on the other hand, the proposed combined loss of GIOU loss and BIOU loss is used to enhance the localization performance of the target; the comparison test proves that the invention has good performance in the aspect of small target detection.
The multi-level receptive field expanding method for small target detection is characterized in that in the step one, the preprocessing comprises the following steps:
designing a data enhancement strategy, wherein the data enhancement strategy is as follows: scaling an image size of an input image, using multi-scale training to enhance sample scale diversity;
and randomly and horizontally turning the input image in the COCO data set for data augmentation so as to enhance the generalization capability of the model.
The multi-level receptive field expanding method for small target detection is characterized in that the Swin transducer corresponds to a four-layer structure, and four extraction features with different scales and different depths are correspondingly extracted
Figure 815197DEST_PATH_IMAGE001
In which
Figure 329355DEST_PATH_IMAGE002
Meridian/channel
Figure 193405DEST_PATH_IMAGE003
Convolution adjusts the channel number to obtain the characteristics
Figure 211040DEST_PATH_IMAGE004
Wherein, in the process,
Figure 767923DEST_PATH_IMAGE002
output characteristics of four different scales are output by multi-level receptive field characteristic fusion network
Figure 503798DEST_PATH_IMAGE005
In which
Figure 968277DEST_PATH_IMAGE006
The receptive field feature amplification modules on four feature layers in the multi-level receptive field feature fusion network are represented as
Figure 840419DEST_PATH_IMAGE007
Wherein, in the process,
Figure 568203DEST_PATH_IMAGE006
the corresponding relationship is as follows:
Figure 292839DEST_PATH_IMAGE008
Figure 498692DEST_PATH_IMAGE009
Figure 225340DEST_PATH_IMAGE010
Figure 124026DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 365651DEST_PATH_IMAGE012
respectively representing the layer 2 output characteristic, the layer 3 output characteristic, the layer 4 output characteristic and the layer 5 output characteristic,
Figure 578458DEST_PATH_IMAGE013
respectively representing layer 2 features, layer 3 features, layer 4 features and layer 5 features,
Figure 487508DEST_PATH_IMAGE014
respectively show the receptive field characteristic amplifying modules on the 2 nd characteristic layer, the 3 rd characteristic layer, the 4 th characteristic layer and the 5 th characteristic layer,
Figure 291516DEST_PATH_IMAGE015
representing the number of the receptive field feature amplification modules in a single feature layer,
Figure 223700DEST_PATH_IMAGE016
meaning upsampling using a two-fold adjacent sample interpolation.
The multi-level receptive field expanding method for small target detection is characterized in that the receptive field characteristic amplifying module comprises a plurality of basesA basic unit, which is the layer 4 characteristic of Swin Transformer of the backbone network in the layer 4 characteristic
Figure 269891DEST_PATH_IMAGE017
Base unit 1 of the receptive field feature amplification module
Figure 236710DEST_PATH_IMAGE018
Obtaining a first base unit output characteristic
Figure 211619DEST_PATH_IMAGE019
Then passes through the 2 nd basic unit
Figure 896679DEST_PATH_IMAGE020
Obtaining a second base unit output characteristic
Figure 44763DEST_PATH_IMAGE021
Finally via the 3 rd basic unit
Figure 866089DEST_PATH_IMAGE022
Merging layer 4 features of the backbone network by residual connection
Figure 277478DEST_PATH_IMAGE017
Third base unit output characteristics for layer 4 characteristics
Figure 918675DEST_PATH_IMAGE023
The corresponding expression is:
Figure 808134DEST_PATH_IMAGE024
Figure 239291DEST_PATH_IMAGE025
Figure 556003DEST_PATH_IMAGE026
wherein the third basic unit outputs characteristics
Figure 746813DEST_PATH_IMAGE023
The output characteristic of the first receptive field characteristic amplification module of the 4 th characteristic layer is obtained.
The multi-level receptive field expanding method for small target detection is characterized in that the first basic unit outputs characteristics
Figure 439962DEST_PATH_IMAGE019
Is expressed as:
Figure 235880DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 723493DEST_PATH_IMAGE028
represent
Figure 604862DEST_PATH_IMAGE003
The result of the convolution is,
Figure 836123DEST_PATH_IMAGE029
representing a convolution kernel of
Figure 486547DEST_PATH_IMAGE030
The convolution of the holes of (a) with (b),
Figure 207378DEST_PATH_IMAGE031
which represents the spreading ratio of the convolution of the hole,
Figure 74578DEST_PATH_IMAGE032
which represents the normalization of the batch,
Figure 109530DEST_PATH_IMAGE033
it is shown that the activation function is,
Figure 348881DEST_PATH_IMAGE034
to representInvolving batch normalization and activation functions
Figure 443876DEST_PATH_IMAGE035
The result of the convolution is,
Figure 299837DEST_PATH_IMAGE036
the representation comprising batch normalization and activation functions
Figure 669638DEST_PATH_IMAGE037
And (4) convolution of holes.
The multi-level receptive field expanding method for small target detection is characterized in that the bounding box regression loss function is expressed as:
Figure 294655DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure 560551DEST_PATH_IMAGE039
a bounding box regression loss function is represented,
Figure 638228DEST_PATH_IMAGE040
represents the GIOU loss function,
Figure 516448DEST_PATH_IMAGE041
represents the BIOU loss function,
Figure 995971DEST_PATH_IMAGE042
a prediction bounding box is represented that is,
Figure 432768DEST_PATH_IMAGE043
a reference frame is shown in the drawing,
Figure 794480DEST_PATH_IMAGE044
the position of the bounding box is indicated,
Figure 709346DEST_PATH_IMAGE045
Figure 308955DEST_PATH_IMAGE046
the coordinates representing the center point of the bounding box,
Figure 916654DEST_PATH_IMAGE047
respectively representing the width and height of the bounding box,
Figure 703344DEST_PATH_IMAGE048
represents the minimum bounding box area of the prediction bounding box and the annotation box,
Figure 421901DEST_PATH_IMAGE049
indicating a loss of Smooth L1,
Figure 374551DEST_PATH_IMAGE050
indicating a contact ratio calculation.
In the fifth step, in the identification task, a Focal loss function is used for solving the problem of imbalance of positive and negative samples, and the corresponding Focal loss function is expressed as:
Figure 153152DEST_PATH_IMAGE051
wherein the content of the first and second substances,
Figure 489455DEST_PATH_IMAGE052
the local function is represented by the following formula,
Figure 11703DEST_PATH_IMAGE053
a prediction score is represented by a number of points,
Figure 320325DEST_PATH_IMAGE054
the presence of a real label is indicated,
Figure 269826DEST_PATH_IMAGE055
representing the number of balanced positive and negative samples,
Figure 765530DEST_PATH_IMAGE056
representThe factor is adjusted.
In the step five, the total loss function corresponding to the positioning and recognition of the detection model is expressed as:
Figure 91469DEST_PATH_IMAGE057
wherein, the first and the second end of the pipe are connected with each other,
Figure 254597DEST_PATH_IMAGE058
representing the total loss function of the detection model for positioning and identifying,
Figure 876464DEST_PATH_IMAGE059
both represent hyper-parameters.
The invention also provides a multi-level receptive field expanding system for small target detection, wherein the system comprises:
a pre-processing module to:
preprocessing suitable for small target detection is carried out on an input image in the COCO data set;
a feature extraction module to:
introducing a Swin transform as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin transform to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer;
a network construction module to:
constructing a multi-level receptive field feature fusion network, matching the required receptive fields of all feature layers in the Swin transducer and supplementing shallow prediction features through receptive field feature amplification modules in the multi-level receptive field feature fusion network, wherein after matching, each feature layer corresponds to a plurality of receptive field feature amplification modules;
a loss determination module to:
taking the linear combination of the GIOU loss and the BIOU loss as the regression loss of the bounding box, and enhancing the target positioning effect according to the regression loss function of the corresponding bounding box;
a result output module to:
and distributing the targets with different scales in the input image on the feature layers with different receptive fields, and positioning and identifying the small targets by utilizing the shallow prediction feature layer in the detection model to obtain the positioning and identifying results of the small targets.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of embodiments of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a multi-level receptive field expansion method for small target detection proposed by the present invention;
fig. 2 is a schematic structural diagram of a multi-level receptive field expanding system for small target detection according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a multi-level receptive field expanding method for small target detection, wherein the method includes the following steps:
s101, preprocessing applicable to small object detection is carried out on the input image in the COCO data set.
In particular, most images in the COCO dataset are of life origin, with a complex background. There were 80 classes in the dataset, each image containing on average 3.5 classes and 7.7 instances. In the COCO dataset, targets with an area less than 32 × 32 were defined as small targets, the percentage of small objects being 41%.
In step S101, the preprocessing includes the steps of:
s1011, designing a data enhancement strategy, wherein the data enhancement strategy is as follows: the image size of the input image is scaled, using multi-scale training to enhance sample scale diversity.
In this step, the scaled image size is (480, 1333).
And S1012, performing data amplification on the input image in the COCO data set by adopting random horizontal inversion to enhance the generalization capability of the model.
S102, introducing a Swin Transformer as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin Transformer to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer.
In step S102, the Swin Transformer has a four-layer structure, and the multi-layer receptive field feature fusion network is used for outputting four output features with different scales
Figure 390622DEST_PATH_IMAGE005
Wherein
Figure 254673DEST_PATH_IMAGE006
The receptive field feature amplification modules on four feature layers in the multilevel receptive field feature fusion network are represented as
Figure 69045DEST_PATH_IMAGE007
Wherein
Figure 625929DEST_PATH_IMAGE006
Firstly, inputting a preprocessed image, and correspondingly extracting four extraction features with different scales and different depths
Figure 96224DEST_PATH_IMAGE001
Wherein, in the process,
Figure 498387DEST_PATH_IMAGE002
meridian/channel
Figure 636107DEST_PATH_IMAGE003
Convolution adjusts the channel number to obtain the characteristics
Figure 363892DEST_PATH_IMAGE004
Wherein
Figure 118221DEST_PATH_IMAGE006
Then, layer 5 features
Figure 88189DEST_PATH_IMAGE060
ThroughnA 5 th layer reception field characteristic amplification module obtains 5 th layer output characteristics
Figure 80416DEST_PATH_IMAGE061
Is expressed by the formula
Figure 447943DEST_PATH_IMAGE062
Next, layer 5 output features
Figure 158410DEST_PATH_IMAGE061
Then, the feature is obtained by up-sampling by a two-fold adjacent sampling interpolation method
Figure 167954DEST_PATH_IMAGE063
Through the 4 th layernThe characteristics obtained by the 4 th layer reception field characteristic amplification module are fused to obtain the 4 th layer output characteristics
Figure 811425DEST_PATH_IMAGE064
Is expressed by the formula
Figure 881013DEST_PATH_IMAGE065
. In the same way, will
Figure 813196DEST_PATH_IMAGE064
Upsampling and layer 3 features
Figure 360852DEST_PATH_IMAGE066
Fusing to obtain layer 3 output characteristics
Figure 829136DEST_PATH_IMAGE067
Then, the same operation is used to obtain the output characteristics of the 2 nd layer
Figure 804046DEST_PATH_IMAGE068
Specifically, the correspondence is as follows:
Figure 489105DEST_PATH_IMAGE069
Figure 637189DEST_PATH_IMAGE009
Figure 458515DEST_PATH_IMAGE010
Figure 338746DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 245522DEST_PATH_IMAGE012
respectively representing the layer 2 output characteristic, the layer 3 output characteristic, the layer 4 output characteristic and the layer 5 output characteristic,
Figure 400560DEST_PATH_IMAGE013
respectively representing layer 2 features, layer 3 features, layer 4 features and layer 5 features,
Figure 873130DEST_PATH_IMAGE014
respectively represent the receptive fieldness on the 2 nd, 3 rd, 4 th and 5 th feature layersA sign-up and amplification module for amplifying the light,
Figure 455421DEST_PATH_IMAGE015
representing the number of the receptive field feature amplification modules in a single feature layer,
Figure 348028DEST_PATH_IMAGE016
indicating upsampling using a double neighbor interpolation method.
S103, constructing a multi-level reception field feature fusion network, matching the required reception fields of all feature layers in the Swin transform through a reception field feature amplification module in the multi-level reception field feature fusion network and supplementing shallow prediction features, wherein after matching, each feature layer is provided with a plurality of reception field feature amplification modules correspondingly.
In the invention, the receptive field characteristic amplification module comprises a plurality of basic units, and taking the 4 th characteristic layer as an example, in the 4 th characteristic layer, the 4 th layer characteristic of Swin transducer as a backbone network
Figure 41178DEST_PATH_IMAGE017
1 st basic unit of receptor field characteristic amplification module
Figure 571516DEST_PATH_IMAGE070
Obtaining a first base unit output characteristic
Figure 324709DEST_PATH_IMAGE019
Then passes through the 2 nd basic unit
Figure 940498DEST_PATH_IMAGE020
Obtaining a second base unit output characteristic
Figure 234076DEST_PATH_IMAGE021
Finally, via the 3 rd basic unit
Figure 884500DEST_PATH_IMAGE022
Merging layer 4 features of the backbone network by residual connection
Figure 808594DEST_PATH_IMAGE017
Third base unit output characteristics for layer 4 characteristics
Figure 911679DEST_PATH_IMAGE023
The corresponding expression is:
Figure 459815DEST_PATH_IMAGE024
Figure 964745DEST_PATH_IMAGE025
Figure 59740DEST_PATH_IMAGE026
wherein the third basic unit outputs characteristics
Figure 446859DEST_PATH_IMAGE023
The output characteristic of the first receptive field characteristic amplifying module of the 4 th characteristic layer.
As a supplementary note, the other receptive field feature amplification modules on the 4 th feature layer all perform the same operation as described above with the output of the previous receptive field feature amplification module as an input feature. The receptive field feature amplification module on the 4 th feature layer has 3 basic units, the receptive field amplification modules on the 5 th, 3 rd and 2 nd features have 4, 3 and 1 basic units respectively, and the operation is the same as that of the 4 th feature layer.
Further, the first basic unit outputs a characteristic
Figure 285502DEST_PATH_IMAGE019
Is expressed as:
Figure 910519DEST_PATH_IMAGE071
wherein, the first and the second end of the pipe are connected with each other,
Figure 910836DEST_PATH_IMAGE028
represent
Figure 988513DEST_PATH_IMAGE035
The result of the convolution is,
Figure 365268DEST_PATH_IMAGE029
representing a convolution kernel of
Figure 343326DEST_PATH_IMAGE030
The convolution of the hole of (a) with (b),
Figure 576861DEST_PATH_IMAGE031
which represents the spreading ratio of the convolution of the hole,
Figure 141835DEST_PATH_IMAGE032
which represents the normalization of the batch,
Figure 322280DEST_PATH_IMAGE033
it is shown that the activation function is,
Figure 656310DEST_PATH_IMAGE034
the representation comprising batch normalization and activation functions
Figure 264008DEST_PATH_IMAGE035
The result of the convolution is,
Figure 50699DEST_PATH_IMAGE036
the representation comprising batch normalization and activation functions
Figure 831573DEST_PATH_IMAGE037
And (4) performing hole convolution.
Additionally, the spreading ratio of each base unit is carefully designed. In order to avoid the problem of detailed information loss caused by the checkerboard effect caused by the discontinuity of the used data, different expansion rates are set for the basic units on different feature layers so as to fully utilize the information and match the required receptive fields of targets with different scales. For the 5 th feature layer, the expansion rates of the basic units are respectively set to be 1, 3, 9 and 9; settings 1, 3, 9 for the 4 th feature layer; the 3 rd feature layer is set to be 1, 2 and 3; the 2 nd feature layer is set to 1.
And S104, taking the linear combination of the GIOU loss and the BIOU loss as the regression loss of the boundary frame, and enhancing the target positioning effect according to the corresponding regression loss function of the boundary frame.
In step S104, the bounding box regression loss function is expressed as:
Figure 285688DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure 64288DEST_PATH_IMAGE039
a bounding box regression loss function is represented,
Figure 839740DEST_PATH_IMAGE072
represents the GIOU loss function,
Figure 96409DEST_PATH_IMAGE041
represents the BIOU loss function,
Figure 405030DEST_PATH_IMAGE073
a prediction bounding box is represented that is,
Figure 354532DEST_PATH_IMAGE043
a reference frame is shown in the drawing,
Figure 178131DEST_PATH_IMAGE044
the position of the bounding box is indicated,
Figure 238491DEST_PATH_IMAGE045
Figure 667198DEST_PATH_IMAGE046
the coordinates representing the center point of the bounding box,
Figure 522022DEST_PATH_IMAGE047
respectively representing the width and height of the bounding box,
Figure 770601DEST_PATH_IMAGE048
represents the minimum bounding box area of the prediction bounding box and the annotation box,
Figure 133187DEST_PATH_IMAGE049
indicating a loss of Smooth L1,
Figure 213138DEST_PATH_IMAGE050
representing the contact ratio calculation.
And S105, distributing the targets with different scales in the input image on the feature layers with different receptive fields, and positioning and identifying the small targets by utilizing the shallow prediction feature layer in the detection model to obtain the positioning identification result of the small targets.
In step S105, in the identification task, the positive and negative sample imbalance problem is solved using the following Focal loss function:
Figure 504442DEST_PATH_IMAGE051
wherein the content of the first and second substances,
Figure 240317DEST_PATH_IMAGE052
the local function is represented by the following formula,
Figure 908059DEST_PATH_IMAGE053
the prediction score is represented by a number of prediction points,
Figure 780200DEST_PATH_IMAGE054
a real label is represented by a tag that is true,
Figure 242405DEST_PATH_IMAGE055
representing the number of balanced positive and negative samples,
Figure 262314DEST_PATH_IMAGE056
indicating the adjustment factor.
In addition, the total loss function corresponding to the positioning and identification performed by the detection model is represented as:
Figure 733746DEST_PATH_IMAGE057
wherein, the first and the second end of the pipe are connected with each other,
Figure 725973DEST_PATH_IMAGE058
representing the total loss function of the detection model for positioning and identifying,
Figure 594965DEST_PATH_IMAGE059
both represent hyper-parameters.
Specifically, after model training is completed, a test set sample is input, and output average precision AP (IOU threshold value 0.50-0.95), AP50 (IOU threshold value 0.50), AP75 (IOU threshold value 0.75) and APS (IOU threshold value 0.50-0.95 and an object smaller than 32 × 32 pixels) are obtained and used for evaluating model performance.
The invention provides a multi-level receptive field expansion method for small target detection, which introduces Swin transform as a backbone network of a model and utilizes the hierarchy, locality and translational invariance to extract the characteristics of a small target; according to different output characteristics of each stage of the backbone network, a multi-stage receptive field expansion network is designed, and the output characteristics of the backbone network are further processed, so that the problem of small target information loss is avoided; in addition, the receptive field amplifying module can effectively expand the receptive field. According to task requirements, the structure of each layer of receptive field amplification module is flexibly adjusted to match the receptive fields required by targets with different scales and obtain rich context information; on the other hand, the proposed combined loss of GIOU loss and BIOU loss is used to enhance the localization performance of the target; the comparison test proves that the invention has good performance in the aspect of small target detection.
Referring to fig. 2, the present invention further provides a multi-level receptive field expanding system for small target detection, wherein the system comprises:
a pre-processing module to:
preprocessing suitable for small target detection is carried out on an input image in the COCO data set;
a feature extraction module to:
introducing a Swin transform as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin transform to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer;
a network construction module to:
constructing a multi-level receptive field feature fusion network, matching the required receptive fields of all feature layers in the Swin transducer and supplementing shallow prediction features through receptive field feature amplification modules in the multi-level receptive field feature fusion network, wherein after matching, each feature layer corresponds to a plurality of receptive field feature amplification modules;
a loss determination module to:
taking the linear combination of the GIOU loss and the BIOU loss as the regression loss of the bounding box, and enhancing the target positioning effect according to the regression loss function of the corresponding bounding box;
a result output module to:
and distributing the targets with different scales in the input image on the feature layers with different receptive fields, and positioning and identifying the small targets by utilizing the shallow prediction feature layer in the detection model to obtain the positioning and identifying results of the small targets.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (2)

1. A multi-level receptive field expansion method for small target detection, the method comprising the steps of:
firstly, preprocessing applicable to small target detection is carried out on an input image in a COCO data set;
step two, introducing a Swin Transformer as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin Transformer to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer;
thirdly, constructing a multi-level reception field feature fusion network, matching the required reception fields of all feature layers in the Swin transform through a reception field feature amplification module in the multi-level reception field feature fusion network and supplementing shallow prediction features, wherein after matching, each feature layer is correspondingly provided with a plurality of reception field feature amplification modules;
step four, taking the linear combination of the GIOU loss and the BIOU loss as the regression loss of the bounding box, and strengthening the target positioning effect according to the regression loss function of the corresponding bounding box;
step five, distributing the targets with different scales in the input image on feature layers with different receptive fields, and positioning and identifying the small targets by utilizing a shallow prediction feature layer in the detection model to obtain the positioning identification result of the small targets;
in the first step, the pretreatment comprises the following steps:
designing a data enhancement strategy, wherein the data enhancement strategy is as follows: scaling an image size of an input image, using multi-scale training to enhance sample scale diversity;
randomly turning horizontally an input image in the COCO data set to perform data augmentation so as to enhance the generalization capability of the model;
the Swin transducer is correspondingly provided with a four-layer structure, and four extraction features with different scales and different depths are correspondingly extracted
Figure 939823DEST_PATH_IMAGE001
Wherein, in the step (A),
Figure 271578DEST_PATH_IMAGE002
meridian/channel
Figure 27044DEST_PATH_IMAGE003
Convolution adjusts the channel number to obtain the characteristics
Figure 760645DEST_PATH_IMAGE004
Wherein, in the step (A),
Figure 576154DEST_PATH_IMAGE002
multi-level reception field feature fusion network for outputting output features of four different scales
Figure 395206DEST_PATH_IMAGE005
Wherein, in the step (A),
Figure 688784DEST_PATH_IMAGE006
multilevel receptive field featuresThe receptive field characteristic amplification module on the four characteristic layers in the converged network is represented as
Figure 276891DEST_PATH_IMAGE007
Wherein, in the process,
Figure 528881DEST_PATH_IMAGE006
the corresponding relationship is as follows:
Figure 835229DEST_PATH_IMAGE008
Figure 932498DEST_PATH_IMAGE009
Figure 372182DEST_PATH_IMAGE010
Figure 529493DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 323137DEST_PATH_IMAGE012
respectively representing the layer 2 output characteristic, the layer 3 output characteristic, the layer 4 output characteristic and the layer 5 output characteristic,
Figure 958518DEST_PATH_IMAGE013
respectively representing layer 2 features, layer 3 features, layer 4 features and layer 5 features,
Figure 521217DEST_PATH_IMAGE014
respectively showing the amplification of the characteristics of the receptive field on the 2 nd, 3 rd, 4 th and 5 th characteristic layersThe module is provided with a plurality of modules,
Figure 849430DEST_PATH_IMAGE015
represents the number of the receptive field feature amplification modules in a single feature layer,
Figure 864791DEST_PATH_IMAGE016
representing upsampling using a double neighboring sample interpolation;
the receptive field characteristic amplification module comprises a plurality of basic units, and in the 4 th characteristic layer, the 4 th layer characteristic of Swin transducer as a backbone network
Figure 772704DEST_PATH_IMAGE017
Base unit 1 of the receptive field feature amplification module
Figure 314544DEST_PATH_IMAGE018
Obtaining a first base unit output characteristic
Figure 689025DEST_PATH_IMAGE019
Then passes through the 2 nd basic unit
Figure 316315DEST_PATH_IMAGE020
Obtaining a second base unit output characteristic
Figure 168864DEST_PATH_IMAGE021
Finally, via the 3 rd basic unit
Figure 830790DEST_PATH_IMAGE022
Merging layer 4 features of the backbone network by residual connection
Figure 376172DEST_PATH_IMAGE017
Third base unit output characteristics for layer 4 characteristics
Figure 756338DEST_PATH_IMAGE023
The corresponding expression is:
Figure 412578DEST_PATH_IMAGE024
Figure 929010DEST_PATH_IMAGE025
Figure 243880DEST_PATH_IMAGE026
wherein the third basic unit outputs characteristics
Figure 783446DEST_PATH_IMAGE023
The output characteristic of the first receptive field characteristic amplification module of the 4 th characteristic layer is obtained;
first basic unit output characteristics
Figure 305694DEST_PATH_IMAGE019
Is expressed as:
Figure 411054DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 360555DEST_PATH_IMAGE028
represent
Figure 121838DEST_PATH_IMAGE029
The result of the convolution is,
Figure 447777DEST_PATH_IMAGE030
representing a convolution kernel of
Figure 610905DEST_PATH_IMAGE031
The convolution of the holes of (a) with (b),
Figure 731308DEST_PATH_IMAGE032
which represents the spreading ratio of the convolution of the hole,
Figure 979886DEST_PATH_IMAGE033
which represents the normalization of the batch,
Figure 843937DEST_PATH_IMAGE034
it is shown that the activation function is,
Figure 861572DEST_PATH_IMAGE035
the representation comprising batch normalization and activation functions
Figure 418455DEST_PATH_IMAGE029
The result of the convolution is,
Figure 154330DEST_PATH_IMAGE036
representing a representation containing batch normalization and activation functions
Figure 353230DEST_PATH_IMAGE031
Performing hole convolution;
the bounding box regression loss function is expressed as:
Figure 490950DEST_PATH_IMAGE037
wherein, the first and the second end of the pipe are connected with each other,
Figure 218735DEST_PATH_IMAGE038
a bounding box regression loss function is represented,
Figure 176327DEST_PATH_IMAGE039
represents the GIOU loss function,
Figure 644830DEST_PATH_IMAGE040
represents the BIOU loss function,
Figure 637056DEST_PATH_IMAGE041
a prediction bounding box is represented that is,
Figure 270163DEST_PATH_IMAGE042
a reference frame is shown in the drawing,
Figure 715051DEST_PATH_IMAGE043
the position of the bounding box is indicated,
Figure 724595DEST_PATH_IMAGE044
Figure 571328DEST_PATH_IMAGE045
the coordinates representing the center point of the bounding box,
Figure 640916DEST_PATH_IMAGE046
respectively representing the width and height of the bounding box,
Figure 573099DEST_PATH_IMAGE047
represents the minimum bounding box area of the prediction bounding box and the annotation box,
Figure 386335DEST_PATH_IMAGE048
indicating a loss of Smooth L1,
Figure 353154DEST_PATH_IMAGE049
representing a contact ratio calculation;
in the fifth step, in the identification task, a Focal local function is used for solving the problem of imbalance of positive and negative samples, and the corresponding Focal local function is expressed as follows:
Figure 124801DEST_PATH_IMAGE050
wherein, the first and the second end of the pipe are connected with each other,
Figure 809860DEST_PATH_IMAGE051
the local function is represented by the following formula,
Figure 161207DEST_PATH_IMAGE052
a prediction score is represented by a number of points,
Figure 982532DEST_PATH_IMAGE053
the presence of a real label is indicated,
Figure 128343DEST_PATH_IMAGE054
representing the number of balanced positive and negative samples,
Figure 769540DEST_PATH_IMAGE055
represents a regulatory factor;
in the fifth step, the total loss function corresponding to the positioning and identification performed by the detection model is represented as:
Figure 924577DEST_PATH_IMAGE056
wherein the content of the first and second substances,
Figure 597480DEST_PATH_IMAGE057
representing the total loss function of the detection model for positioning and identifying,
Figure 179771DEST_PATH_IMAGE058
both represent hyper-parameters.
2. A multi-level receptive field expansion system for small target detection, characterized in that the system applies a multi-level receptive field expansion method for small target detection as claimed in claim 1 above, the system comprising:
a pre-processing module to:
preprocessing suitable for small target detection is carried out on an input image in the COCO data set;
a feature extraction module to:
introducing a Swin transform as a backbone network, and performing feature extraction on the input image by using a hierarchical structure of the Swin transform to obtain multiple layers of features, wherein each layer of features corresponds to a feature layer;
a network construction module to:
constructing a multi-level receptive field feature fusion network, matching the required receptive fields of all feature layers in the Swin transducer and supplementing shallow prediction features through receptive field feature amplification modules in the multi-level receptive field feature fusion network, wherein after matching, each feature layer corresponds to a plurality of receptive field feature amplification modules;
a loss determination module to:
the linear combination of the GIOU loss and the BIOU loss is used as the regression loss of the boundary frame, and the target positioning effect is enhanced according to the corresponding regression loss function of the boundary frame;
a result output module to:
and distributing the targets with different scales in the input image on the feature layers with different receptive fields, and positioning and identifying the small targets by utilizing the shallow prediction feature layer in the detection model to obtain the positioning and identifying results of the small targets.
CN202211209625.XA 2022-09-30 2022-09-30 Multi-level receptive field expanding method and system for small target detection Active CN115272648B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211209625.XA CN115272648B (en) 2022-09-30 2022-09-30 Multi-level receptive field expanding method and system for small target detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211209625.XA CN115272648B (en) 2022-09-30 2022-09-30 Multi-level receptive field expanding method and system for small target detection

Publications (2)

Publication Number Publication Date
CN115272648A CN115272648A (en) 2022-11-01
CN115272648B true CN115272648B (en) 2022-12-20

Family

ID=83757963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211209625.XA Active CN115272648B (en) 2022-09-30 2022-09-30 Multi-level receptive field expanding method and system for small target detection

Country Status (1)

Country Link
CN (1) CN115272648B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288075A (en) * 2018-02-02 2018-07-17 沈阳工业大学 A kind of lightweight small target detecting method improving SSD
CN110321923A (en) * 2019-05-10 2019-10-11 上海大学 Object detection method, system and the medium of different scale receptive field Feature-level fusion
CN111767792A (en) * 2020-05-22 2020-10-13 上海大学 Multi-person key point detection network and method based on classroom scene
CN212062695U (en) * 2020-07-06 2020-12-01 华东交通大学 Multi-band MIMO antenna based on orthogonal layout
WO2021185379A1 (en) * 2020-03-20 2021-09-23 长沙智能驾驶研究院有限公司 Dense target detection method and system
CN114998696A (en) * 2022-05-26 2022-09-02 燕山大学 YOLOv3 target detection method based on feature enhancement and multi-level fusion

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019028725A1 (en) * 2017-08-10 2019-02-14 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
CN111695430B (en) * 2020-05-18 2023-06-30 电子科技大学 Multi-scale face detection method based on feature fusion and visual receptive field network
CN111967538B (en) * 2020-09-25 2024-03-15 北京康夫子健康技术有限公司 Feature fusion method, device and equipment applied to small target detection and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288075A (en) * 2018-02-02 2018-07-17 沈阳工业大学 A kind of lightweight small target detecting method improving SSD
CN110321923A (en) * 2019-05-10 2019-10-11 上海大学 Object detection method, system and the medium of different scale receptive field Feature-level fusion
WO2021185379A1 (en) * 2020-03-20 2021-09-23 长沙智能驾驶研究院有限公司 Dense target detection method and system
CN111767792A (en) * 2020-05-22 2020-10-13 上海大学 Multi-person key point detection network and method based on classroom scene
CN212062695U (en) * 2020-07-06 2020-12-01 华东交通大学 Multi-band MIMO antenna based on orthogonal layout
CN114998696A (en) * 2022-05-26 2022-09-02 燕山大学 YOLOv3 target detection method based on feature enhancement and multi-level fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Two Dimensional Frequency-angle Domain Interpolation Method for Electromagnetic Scattering Analysis of Precipitation Particles;Jiaqi Chen等;《IEEE》;20161110;全文 *
基于改进Faster R-CNN图像小目标检测;王凯等;《电视技术》;20191025(第20期);全文 *
基于有效感受野的目标检测算法;杨建秀;《山西大同大学学报(自然科学版)》;20200818(第04期);全文 *

Also Published As

Publication number Publication date
CN115272648A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN108647585B (en) Traffic identifier detection method based on multi-scale circulation attention network
Xue et al. A fast detection method via region‐based fully convolutional neural networks for shield tunnel lining defects
CN112884064B (en) Target detection and identification method based on neural network
Saberironaghi et al. Defect detection methods for industrial products using deep learning techniques: a review
CN110852316A (en) Image tampering detection and positioning method adopting convolution network with dense structure
CN110751154B (en) Complex environment multi-shape text detection method based on pixel-level segmentation
CN112488025B (en) Double-temporal remote sensing image semantic change detection method based on multi-modal feature fusion
CN110009622B (en) Display panel appearance defect detection network and defect detection method thereof
CN115439442A (en) Industrial product surface defect detection and positioning method and system based on commonality and difference
CN113591719A (en) Method and device for detecting text with any shape in natural scene and training method
CN114462469B (en) Training method of target detection model, target detection method and related device
Choi et al. Deep learning based defect inspection using the intersection over minimum between search and abnormal regions
Yasmin et al. Small obstacles detection on roads scenes using semantic segmentation for the safe navigation of autonomous vehicles
Liang et al. Car detection and classification using cascade model
Ni et al. Toward high-precision crack detection in concrete bridges using deep learning
CN113255555A (en) Method, system, processing equipment and storage medium for identifying Chinese traffic sign board
CN115272648B (en) Multi-level receptive field expanding method and system for small target detection
Kwon et al. Context and scale-aware YOLO for welding defect detection
Das et al. Object Detection on Scene Images: A Novel Approach
Ghahremani et al. Toward robust multitype and orientation detection of vessels in maritime surveillance
CN115294103B (en) Real-time industrial surface defect detection method based on semantic segmentation
Dong et al. Intelligent pixel-level pavement marking detection using 2D laser pavement images
CN112330683B (en) Lineation parking space segmentation method based on multi-scale convolution feature fusion
CN117475262B (en) Image generation method and device, storage medium and electronic equipment
Anand et al. WA net: Leveraging Atrous and Deformable Convolutions for Efficient Text Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240116

Address after: 230000 B-1015, wo Yuan Garden, 81 Ganquan Road, Shushan District, Hefei, Anhui.

Patentee after: HEFEI MINGLONG ELECTRONIC TECHNOLOGY Co.,Ltd.

Address before: No. 808, Shuanggang East Street, Nanchang Economic and Technological Development Zone, Jiangxi Province

Patentee before: East China Jiaotong University